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(57) Abstract 

A cellulose- or hemicellulose-degrading enzyme which is derivable from a fungus other than Trichoderma or Phanero- 
chaete, and which comprises a carbohydrate binding domain homologous to a terminal A region of Trichoderma reesei cellulases, 
which carbohydrate binding domain comprises amino acid sequence (a) or a subsequence thereof capable of effecting binding of 
the enzyme to an insoluble cellulosic or hemicellulosic substrate. 
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AN ENZYME CAPABLE OF DEGRADING CELLULOSE OR HEMI CELLULOSE 
FIELD OF INVENTION 

5 The present invention relates to a cellulose- or hemicellulose- 
degrading enzyme, a DNA construct coding for the enzyme, a 
method of producing the enzyme, and an agent for degrading 
cellulose or hemicellulose comprising the enzyme. 

10 BACKGROUND OF THE INVENTION 

Enzymes which are able to degrade cellulose have previously 
been suggested for the conversion of biomass into liquid fuel, 
gas and feed protein. However, the production of fermentable 

15 sugars from biomass by means of cellulolytic enzymes is not yet 
able to compete economically with, for instance, the production 
of glucose from starch by means of a-amylase due to the 
inefficiency of the currently used cellulolytic enzymes. 
Cellulolytic enzymes may furthemore be used in the brewing 

20 industry for the degradation of /3-glucans, in the baking 
industry for improving the properties of flour, in paper pulp 
processing for removing the non-crystalline parts of cellulose, 
thus increasing the proportion of crystalline cellulose in the 
pulp, and in animal feed for improving the digestibility of 

25 glucans. A further important use of cellulolytic enzymes is for 
textile treatment, e.g. for reducing the harshness of cotton- 
containing fabrics (cf., for instance, GB 1 368 599 or US 
4,435,307), for soil removal and colour clarification of 
fabrics (cf., for instance, EP 220 016) or for providing a 

30 localized variation in colour to give the fabrics a "stone- 
washed" appearance (cf., for instance, EP 307 564). 

The practical exploitation of cellulolytic enzymes has, to some 
extent, been set back by the nature of the known cellulase 
35 preparations which are often complex mixtures of a variety of 
single cellulase components, and which may have a rather low 
specific activity. It is difficult to optimise the production 
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of single components in multiple enzyme systems and thus to 
implement industrial cost-effective production of cellulolytic 
enzymes, and their actual use has been hampered by difficulties 
arising from the need to employ rather large quantities of the 
5 enzymes to achieve the desired effect. 

The drawbacks of previously suggested cellulolytic enzymes may 
be remedied by using single-component enzymes selected for a 
high specific activity. 

10 

Single-component cellulolytic enzymes have been isolated from, 
e.g. Trichoderma reesei (cf. Teeri et al. f Gene 51, 1987, pp. 
43-52; P.M. Abuja, Biochem. Biophvs. Res. Comm. 156, 1988, pp. 
180-185; and P.J. Kraulis, Biochemistry 28, 1989, pp. 7241- 

15 7257) . The T^ reesei cellulases have been found to be composed 
of a terminal A region responsible for binding to cellulose, a 
B region linking the A region to the core of the enzyme, and a 
core containing the catalytically active domain. The A region 
of different T^ reesei cellulases has been found to be highly 

20 conserved, and a strong homology has also been observed with a 
cellulase produced by Phanerochaete chrvsosporium (Sims et al. , 
Gene 74 r 1988, pp. 411-422). 

SUMMARY OF THE INVENTION 

25 

It has surprisingly been found that other fungi, which are not 
closely related to either Trichoderma reesei or Phanerochaete 
chrvsosporium P are capable of producing enzymes which contain 
a region which is homologous to the A region of T^ reesei 
30 cellulases. 

Accordingly, the present invention relates to a cellulose- or 
hemicellulose-degrading enzyme which is derivable from a fungus 
other than Trichoderma or Phanerochaete . and which comprises a 
35 carbohydrate binding domain homologous to a terminal A region * 
of Trichoderma reesei cellulases, which carbohydrate binding 
domain comprises the following amino acid sequence 
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1 



10 



Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 



5 Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 
Xaa 

10 or a subsequence thereof capable of effecting binding of the 
enzyme to an insoluble cellulosic or hemicellulosic substrate. 
"Xaa" is intended to indicate variations in the amino acid 
sequence of the carbohydrate binding domain of different 
enzymes. A hyphen is intended to indicate a "gap" in the amino 

15 acid sequence (compared to other , similar enzymes) . 

In the present context, the term "cellulose" is intended to 
include soluble and insoluble, amorphous and crystalline forms 
of cellulose. The term "hemicellulose" is intended to include 

20 glucans (apart from starch), mannans, xylans, arabinans or 
polyglucuronic or polygalacturonic acid. The term "carbohydrate 
binding domain" ("CBD") is intended to indicate an amino acid 
sequence capable of effecting binding of the enzyme to a 
carbohydrate substrate, in particular cellulose or 

25 hemicellulose as defined above. The term "homologous" is 
intended to indicate a high degree of identity in the sequence 
of amino acids constituting the carbohydrate binding domain of 
the present enzyme and the amino acids constituting the A 
region found in T\_ reesei cellulases ("A region" is the term 

30 used to denote the cellulose (i.e. carbohydrate) binding domain 
of Tj. reesei cellulases) . 

It is currently believed that cellulose- or hemicellulose- 
degrading enzymes which contain a sequence of amino acids which 

35 is identifiable as a carbohydrate binding domain (or "A region" 
based on its homology to the A region of 0\_ reesei cellulases 
possess certain desirable characteristics as a result of the 
function of the carbohydrate binding domain in the enzyme 
molecule which is to mediate binding to solid substrates 

40 (including cellulose) and consequently to enhance the activity 



20 



30 
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of such enzymes towards such substrates. The identification and 
preparation of carbohydrate binding domain-containing enzymes 
from a variety of microorganisms is therefore of considerable 
interest. 

5 

Cellulose- or hemicellulose-degrading enzymes of the invention 
may conveniently be identified by screening genomic or cDNA 
libraries of different fungi with a probe comprising at least 
part of the DNA encoding the A region of T^. reesei cellulases. 

10 Due to the intraspecies (i.e. different T^ reesei cellulases) 
and interspecies homology observed for the carbohydrate binding 
domains of different cellulose- or hemicellulose-degrading 
enzymes, there is reason to believe that this screening method 
constitutes a convenient way of isolating enzymes of current 

15 interest. 

DETAILED DISCLOSURE OF THE INVENTION 

Carbohydrate binding domain (CBD) containing enzymes of the 
20 invention may, in particular, be derivable from strains of 
Humicola, e.g. Humicola insolens, pusarium, e.g. Fusarium 
oxysporum, or Myceliopthora, e.g. Mvceliopthora thermoohile . 

Some of the variations in the amino acid sequence shown above 
25 appear to be "conservative" , i.e. certain amino acids are 
preferred in these positions among the various CBD-containing 
enzymes of the invention. Thus, in position 1 of the sequence 
shown above, the amino acid is preferentially Trp or Tyr. In 
position 2, the amino acid is preferentially Gly or Ala. In 
30 position 7, the amino acid is preferentially Gin, lie or Asn. 
In position 8, the amino acid is preferentially Gly or Asn. In 
position 9, the amino acid is preferentially Trp, Phe or Tyr. 
In position 10, the amino acid is preferentially Ser, Asn, Thr 
or Gin. In position 12, the amino acid is preferentially Pro, 
35 Ala or Cys. In position 13, the amino acid is preferentially 
Thr, Arg or Lys. In position 14, the amino acid is 
preferentially Thr, Cys or Asn. In position 18, the amino acid 
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is preferentially Gly or Pro. In position 19, the amino acid 
(if present) is preferentially Ser, Thr, Phe, Leu or Ala. In 
position 20, the amino acid is preferentially Thr or Lys. In 
position 24, the amino acid is preferentially Gin or He. In 
5 position 26, the amino acid is preferentially Gin, Asp or Ala. 
In position 27, the amino acid is preferentially Trp, Phe or 
Tyr. In position 29, the amino acid is preferentially Ser, His 
or Tyr- In position 32, the amino acid is preferentially Leu, 
He, Gin, Val or Thr. 

10 

Examples of specific CBD-containing enzymes of the invention 
are those which comprise one of the following amino acid 
sequences 

15 Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys Cys Glu 
Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly He Gly Trp Asn Gly Pro Thr Thr Cys Val 
20 Ser Gly Ala Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly He Gly Phe Asn Gly Pro Thr Cys Cys Gin 
Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin Cys 
25 Leu; 

Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr Cys Ala 
Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin Cys Thr 
Pro; 

30 

Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys Cys Ser 
Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin Cys Leu 
Asn; 

35 Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn Cys Glu 
Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin Cys 
lie; 
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Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn Cys Glu 
Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin Cys 
Leu; 

Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
5 Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly Gin Asn Tyr Ser Gly Pro Thr Thr Cys Lys 
Ser Pro Phe Thr Cys Lys Lys He Asn Asp Phe Tyr Ser Gin Cys 
10 Gin; or 

Trp Gly Gin Cys Gly Gly Asn Gly Trp Thr Gly Ala Thr Thr Cys Ala 
Ser Gly Leu Lys Cys Glu Lys He Asn Asp Trp Tyr Tyr Gin Cys Val 

15 The cellulose- or hemicellulose-degrading enzyme of the 
invention may further comprise an amino acid sequence which 
defines a linking B region (to use the nomenclature established 
for 2\_ reesei cellulases) adjoining the carbohydrate binding 
domain and connecting it to the catalytically active domain of 

20 the enzyme. The B region sequences established so far for 
enzymes of the invention indicate that such sequences are 
characterized by being predominantly hydrophilic and uncharged, 
and by being enriched in certaip amino acids, in particular 
glycine and/or asparagine and/or proline and/or serine and/ or 

25 threonine and/or glutamine. This characteristic structure of 
the B region imparts flexibility to the sequence, in particular 
in sequences containing short, repetitive units of primarily 
glycine and asparagine. Such repeats are not found in the B 
region sequences of 2\_ reesei or chrvsosporium which contain 

30 B regions of the serine/threonine type. The flexible structure 
is believed to facilitate the action of the catalytically 
active domain of the enzyme bound by the A region to the 
insoluble substrate, and therefore imparts advantageous 
properties to the enzyme of the invention. 

35 

Specific examples of B regions contained in enzymes of the 
invention have the following amino acid sequences 
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Ala Arg Thr Asn Val Gly Gly Gly Ser Thr 
Gly Gly Asn Asn Gly Gly Asn Pro Gly Gly 
Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly 
Ser Pro Leu; 

5 

Pro Gly Gly Asn Asn Asn Asn Pro Pro Pro Ala Thr Thr Ser Gin Trp 
Thr Pro Pro Pro Ala Gin Thr Ser Ser Asn Pro Pro Pro Thr Gly Gly 
Gly Gly Gly Asn Thr Leu His Glu Lys; 

10 

Gly Gly Ser Asn Asn Gly Gly Gly Asn Asn Asn Gly Gly Gly Asn Asn 
Asn Gly Gly Gly Gly Asn Asn Asn Gly Gly Gly Asn Asn Asn Gly Gly 
Gly Asn Thr Gly Gly Gly Ser Ala Pro Leu; 

15 Val Phe Thr Cys Ser Gly Asn Ser Gly Gly Gly Ser Asn Pro Ser Asn 
Pro Asn Pro Pro Thr Pro Thr Thr Phe He Thr Gin Val Pro Asn Pro 
Thr Pro Val Ser Pro Pro Thr Cys Thr Val Ala Lys; 

Pro Ala Leu Trp Pro Asn Asn Asn Pro Gin Gin Gly Asn Pro Asn Gin 
20 Gly Gly Asn Asn Gly Gly Gly Asn Gin Gly Gly Gly Asn Gly Gly Cys 
Thr Val Pro Lys; 

Pro Gly Ser Gin Val Thr Thr Ser Thr Thr Ser Ser Ser Ser Thr Thr 
Ser Arg Ala Thr Ser Thr Thr Ser Ala Gly Gly Val Thr Ser He Thr 
25 Thr Ser Pro Thr Arg Thr Val Thr He Pro Gly Gly Ala Ser Thr Thr 
Ala Ser Tyr Asn; 

Glu Ser Gly Gly Gly Asn Thr Asn Pro Thr Asn Pro Thr Asn Pro Thr 
Asn Pro Thr Asn Pro Thr Asn Pro Trp Asn Pro Gly Asn Pro Thr Asn 
30 Pro Gly Asn Pro Gly Gly Gly Asn Gly Gly Asn Gly Gly Asn Cys Ser 
Pro Leu; or 

Pro Ala Val Gin He Pro Ser Ser Ser Thr Ser Ser Pro Val Asn Gin 
Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser Ser Pro Pro 
35 Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 



Gly Gly Gly Asn Asn Gly 
Asn Pro Gly Gly Asn Pro 
Asn Pro Gly Gly Asn Cys 
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In another aspect, the present invention relates to a 
carbohydrate binding domain homologous to a terminal A region 
of Trichoderma reesei cellulases, which carbohydrate binding 
domain comprises the following amino acid sequence 

5 

1 10 

Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 

20 30 
10 Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 

Xaa 

15 or a subsequence thereof capable of effecting binding of a 
protein to an insoluble cellulosic or hemicellulosic substrate. 

Examples of specific carbohydrate binding domains are those 
with the amino acid sequence indicated above. 

20 

In a further aspect, the present invention relates to a linking 
B region derived from a cellulose- or hemicellulose-degrading 
enzyme, said region comprising an amino acid sequence enriched 
in the amino acids glycine and/or asparagine and/or proline 
25 and/or serine and/ or threonine and/or glutamine. As indicated 
above, these amino acids may often occur in short, repetitive 
units. Examples of specific B region sequences are those shown 
above. 



30 The present invention provides a unique oppportunity to 
"shuffle" the various regions of different cellulose- or 
hemicellulose-degrading enzymes, thereby creating novel 
combinations of the CBD, B region and catalytically active 
domain resulting in novel activity profiles of this type of 

35 enzymes. Thus, the enzyme of the invention may be one which 
comprises an amino acid sequence defining a CBD, which amino 
acid sequence is derived from one naturally occurring 
cellulose- or hemicellulose-degrading enzyme, an amino acid 
sequence defining a linking B region, which amino acid sequence 

40 is derived from another naturally occurring cellulose- or 
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hemicellulose-degrading enzyme, as well as a catalytically 
active domain derived from the enzyme supplying either the CBD 
or the B region or from a third enzyme. In a particular 
embodiment, the catalytically active domain is derived from an 
5 enzyme which does not, in nature, comprise any CBD or B region. 
In this way, it is possible to construct enzymes with improved 
binding properties from enzymes which lack the CBD and B 
regions . 

10 The enzyme of the invention is preferably a cellulase such as 
an endoglucanase (capable of hydrolysing amorphous regions of 
low crystallinity in cellulose fibres) , a cellobiohydrolase 
(also known as an exoglucanase , capable of initiating 
degradation of cellulose from the non-reducing chain ends by 

15 removing cellobiose units) or a 0-glucosidase. 

In a still further aspect, the present invention relates to a 
DNA construct which comprises a DNA sequence encoding a 
cellulose- or hemicellulose-degrading enzyme as described 
20 above. 

A DNA sequence encoding the present enzyme may, for instance, 
be isolated by establishing a cDNA or genomic library of a 
microorganism known to produce cellulose- or hemicellulose- 

25 degrading enzymes, such as a strain of Humicola, Fusarium or 
Mvcelopthora , and screening for positive clones by conventional 
procedures such as by hybridization to oligonucleotide probes 
synthesized on the basis of the full or partial amino acid 
sequence of the enzyme or probes based on the partial or full 

30 DNA sequence of the A region from reesei cellulases, as 
indicated above, or by selecting for clones expressing the 
appropriate enzyme activity, or by selecting for clones 
producing a protein which is reactive with an antibody raised 
against a native cellulose- or hemicellulose-degrading enzyme. 

35 

Alternatively, the DNA sequence encoding the enzyme may be 
prepared synthetically by established standard methods, e.g. 
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the phosphoamidite method described by S.L. Beaucage and M.H. 
Caruthers, Tetrahedron Letters 22, 1981, pp. 1859-1869, or the 
method described by Matthes et al., The EMBO J. 2, 1984, pp. 
801-805. According to the phosphoamidite method, 
5 oligonucleotides are synthesized, e.g. in an automatic DNA 
synthesizer, purified, annealed, ligated and cloned in 
appropriate vectors. 

Finally, the DNA sequence may be of mixed genomic and 
10 synthetic, mixed synthetic and cDNA or mixed genomic and cDNA 
origin prepared by ligating fragments of synthetic, genomic or 
cDNA origin (as appropriate) , the fragments corresponding to 
various parts of the entire DNA construct, in accordance with 
standard techniques. Thus, it may be envisaged that a DNA 
15 sequence encoding the CBD of the enzyme may be of genomic 
origin, while the DNA sequence encoding the B region of the 
enzyme may be of synthetic origin, or vice versa; the DNA 
sequence encoding the catalytically active domain of the enzyme 
may conveniently be of genomic or cDNA origin. The DNA 
20 construct may also be prepared by polymerase chain reaction 
using specific primers, for instance as described in US 
4,683,202 or R.K. Saiki et al., Science 239 , 1988, pp. 487-491. 

The present invention also relates to an expression vector 
25 which carries an inserted DNA construct as described above. The 
expression vector may suitably comprise appropriate promotor, 
operator and terminator sequences permitting the enzyme to be 
expressed in a particular host organism, as well as an origin 
of replication enabling the vector to replicate in the host 
30 organism in question. 

The resulting expression vector may then be transformed into a 
suitable host cell, such as a fungal cell, a preferred example 
of which is a species of Aspergillus , most preferably 
35 Aspergillus orvzae or Aspergillus niger . Fungal cells may be 
transformed by a process involving protoplast formation and 
transformation of the protoplasts followed by regeneration of 
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the cell wall in a manner known per se . The use of Aspergillus 
as a host microorganism is described in EP 238,023 (of Novo 
Industri A/S) , the contents of which are hereby incorporated by 
reference. 

5 

Alternatively, the host organisms may be a bacterium, in 
particular strains of streptomvces and Bacillus , and E. coli. 
The transformation of bacterial cells may be performed 
according to conventional methods, e.g. as described in 
10 Sambrook et al.. Molecular Cloning: A Laboratory Manual , Cold 
Spring Harbor, 1989. 

The screening of appropriate DNA sequences and construction of 
vectors may also be carried out by standard procedures, cf. 
15 Sambrook et al . , op. cit. 

The invention further relates to a method of producing a 
cellulose- or hemicellulose-degrading enzyme as described 
above, wherein a cell transformed with the expression vector of 

20 the invention is cultured under conditions conducive to the 
production of the enzyme, and the enzyme is subsequently 
recovered from the culture. The medium used to culture the 
transformed host cells may be any conventional medium suitable 
for growing the host cells in question. The expressed enzyme 

25 may conveniently be secreted into the culture medium and may be 
recovered therefrom by well-known procedures including 
separating the cells from the medium by centrifugation or 
filtration, precipitating proteinaceous components of the 
medium by means of a salt such as ammonium sulphate, followed 

30 by chromatographic procedures such as ion exchange 
chromatography, affinity chromatography, or the like. 

By employing recombinant DNA techniques as indicated above, 
techniques of fermentation and mutation or other techniques 
35 which are well known in the art, it is possible to provide 
cellulose- or hemicellulose-degrading enzymes of a high purity 
and in a high yield. 
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The present invention further relates to an agent for degrading 
cellulose or hemicellulose, the agent comprising a cellulose- 
or hemicellulose-degrading enzyme as described above. It is 
contemplated that, dependent on the specificity of the enzyme, 
5 it may be employed for one (or possibly more) of the 
applications mentioned above. In a particular embodiment, the 
agent may comprise a combination of two or more enzymes of the 
invention or a combination of one or more enzymes of the 
invention with one or more other enzymes with cellulose- or 
10 hemicellulose-degrading activity. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows the construction of plasmid p SX224; 

Fig. 2 shows the construction of plasmid pHW485; 
15 Fig. 3 shows the construction of plasmid pHW697 and pHW704; 

Fig. 4 shows the construction of plasmid pHw768; 

Fig. 5 is a restriction map of plasmid pSX320; 

Fig. 6 shows the construction of plasmid pSX777 

Fig. 7 shows the construction of plasmid pCaHjl70; 
20 Fig. 8 shows the construction of plasmid IM4; 

Fig. 9 shows the SOE fusion of the ~43kD endoglucanase signal 

peptide and the N- terminal of Endol : 

Fig. 10 shows the construction of plasmid pCaHjl80; 

Fig. 11 shows the DNA sequence and derived amino acid sequence 
25 of F.oxvsporum C-family cellobiohydrolase; 

Fig. 12 shows the DNA sequence and derived amino acid sequence 

of F . oxysporum F-family cellulase; 

Fig. 13 shows the DNA sequence and derived amino acid sequence 
of F . oxvsporum C-family endoglucanase; 

30 Fig. 14.A-E whows the DNA sequence and derived amino acid 
sequence of H. insolens endoglucanase 1(EG1) ; and 
Fig. 15A-D shows the DNA sequence and derived amino acid 
sequence of a fusion of the B.lautus (NCIMB 40250) Endo 1 
catalytic domain and the CBD and B region of H. insolens ~43kD 

3 5 endoglucanase . 
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The invention is further illustrated in the following examples 
which are not in any way intended to limit the scope of the 
invention as claimed. 



5 Example 1 



Isolation of A region-containing clones from H. insolens 

From H. insolens strain DSM 1800 (described in, e.g. WO 
10 89/09259) grown on cellulose, mRNA was prepared according to 
the method described by Koplan et al., Biochem. J. 183 (1979) 
181-184. A cDNA library containing 20,000 clones was obtained 
substantially by the method of Okayama and Berg, Methods in 
Enzymology 154 . 1987, pp. 3-28. 

15 

The cDNA library was screened as described by Gergen et al., 
Nucl. Acids Res. 7(8), 1979, pp. 2115-2136, with 
oligonucleotide probes in the antisense configuration, designed 
according to the published sequences of the N-terminal part of 
20 the A-region of the four 2\_ reesei cellulase genes (Penttila et 
al., Gene 45 (1986), 253-63; Saloheimo et al., Gene 63, (1988), 
11-21; Shoemaker et al., Biotechnology, October 1983, 691-696; 
Teeri et al., Gene 51 (1987) 43-52. The probe sequences were as 
follows: 

25 

NOR-804 5'-CTT GCA CCC GCT GTA CCC AAT GCC ACC GCA CTG CCC 
(- EG 1) CCA-3* 

NOR-805 5 f -CGT GGG GCC GCT GTA GCC AAT ACC GCC GCA CTG GCC 
(-CBH 1) GTA- 3 1 

3 0 NOR-807 5'-AGT CGG ACC CGA CCA ATT CTG GCC ACC ACA TTG GCC 
(~CBH 2) CCA-3' 

NOR-808 5'-CGT AGG TCC GCT CCA ACC AAT ACC TCC ACA CTG GCC 
(-EG 3) CCA-3 1 

35 Screening yielded a large number of candidates hybridising well 
to the A-region probes. Restriction mapping reduced the number 
of interesting clones to 17, of which 8 have so far been 
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sequenced (as described by Haltiner et al. # Nucl. Acids Res. 
13, 1985, pp. 1015-1025) sufficiently to confirm the presence 
of a terminal CBD as well as a B-region. 

5 The deduced amino acid sequences obtained for the CBDs were as 
follows 

A-l: Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys 
Cys Glu Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin 
10 Cys Leu; 

A-5: Trp Gly Gin Cys Gly Gly He Gly Trp Asn Gly Pro Thr Thr 
Cys Val Ser Gly Ala Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin 
Cys Leu; 

15 

CBH-2: Trp Gly Gin Cys Gly Gly He Gly Phe Asn Gly Pro Thr Cys 
Cys Gin Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin 
Cys Leu; 

20 A-8: Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr 
Cys Ala Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin 
Cys Thr Pro; 

A-9: Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys 
25 Cys Ser Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin 
Cys Leu Asn; 

A-ll: Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn 
Cys Glu Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin 
30 Cys He; 

A-19: Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn 
Cys Glu Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin 
35 Cys Leu; and 
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"43 JcD: Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr 
Cys Val Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin 
Cys Leu 

5 The deduced amino acid sequences obtained for the B region were 
as follows 

Al: Ala Arg Thr Asn Val Gly Gly Gly Ser Thr Gly Gly Gly Asn 
Asn Gly Gly Gly Asn Asn Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly 
10 Asn Pro Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly 
Asn Cys Ser Pro Leu; 

A5: Pro Gly Gly Asn Asn Asn Asn Pro Pro Pro Ala Thr Thr Ser 
Gin Trp Thr Pro Pro Pro Ala Gin Thr Ser Ser Asn Pro Pro Pro Thr 
15 Gly Gly Gly Gly Gly Asn Thr Leu His Glu Lys; 

A8: Gly Gly Ser Asn Asn Gly Gly Gly Asn Asn Asn Gly Gly Gly 
Asn Asn Asn Gly Gly Gly Gly Asn Asn Asn Gly Gly Gly Asn Asn Asn 
Gly Gly Gly Asn Thr Gly Gly Gly Ser Ala Pro Leu; 

20 

All: Val Phe Thr Cys Ser Gly Asn Ser Gly Gly Gly Ser Asn Pro 
Ser Asn Pro Asn Pro Pro Thr Pro Thr Thr Phe He Thr Gin Val Pro 
Asn Pro Thr Pro Val Ser Pro Pro Thr Cys Thr Val Ala Lys; 

25 A19: Pro Ala Leu Trp Pro Asn Asn Asn Pro Gin Gin Gly Asn Pro 
Asn Gin Gly Gly Asn Asn Gly Gly Gly Asn Gin Gly Gly Gly Asn Gly 
Gly Cys Thr Val Pro Lys; 

CBH2 : Pro Gly Ser Gin Val Thr Thr Ser Thr Thr Ser Ser Ser Ser 
30 Thr Thr Ser Arg Ala Thr Ser Thr Thr Ser Ala Gly Gly Val Thr Ser 
He Thr Thr Ser Pro Thr Arg Thr Val Thr He Pro Gly Gly Ala Ser 
Thr Thr Ala Ser Tyr Asn; 

A9: Glu Ser Gly Gly Gly Asn Thr Asn Pro Thr Asn Pro Thr Asn 
35 Pro Thr Asn Pro Thr Asn Pro Thr Asn Pro Trp Asn Pro Gly Asn Pro 
Thr Asn Pro Gly Asn Pro Gly Gly Gly Asn Gly Gly Asn Gly Gly Asn 
Cys Ser Pro Leu; or 
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Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser Ser Pro Val Asn Gin 
Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser Ser Pro Pro 
Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 

5 

Example 2 

Expression in A, orvzae of a CBH 2-tvoe cellulase from H. 
insolens 

10 

The complete sequence of one of the CBD clones shows a striking 
similarity to a cellobiohydrolase (CBH 2) from T. reesei . 

The construction of the expression vector pSX224 carrying the 

15 IL_ insolens CBH 2 gene for expression in and secretion from L. 
orvzae is outlined in Fig. 1. The vector p777 containing the 
pUC 19 replicon and the regulatory regions of the TAKA amylase 
promoter from A. orvzae and glucoamylase terminator from A. 
niaer is described in EP 238 023. pSX 217 is composed of the 

20 cloning vector pcDVl-pLl (cf. Okayama and Berg, op. cit. ) 
carrying the H^ insolens CBH 2 gene on a 1.8 kb fragment. The 
CBH 2 gene contains three restriction sites used in the 
construction: A Ball site at the initiating methionine codon in 
the signal sequence, a BstBI site 620 bp downstream from the 

25 Ball site and an Avail site 860 bp downstream from the BstBI 
site. The Avail site is located in the non-translated C- 
terminal part of the gene upstream of the poly A region, which 
is not wanted in the final construction. Nor is the poly G 
region upstream of the gene in the cloning vector. This region 

30 is excised and replaced by an oligonucleotide linker which 
places the translational start codon close to the BamHI site at 
the end of the TAKA promoter. 

The expression vector pSX 224 was transformed into A. orvzae 
35 IFO 4177 using the amdS gene from A. nidulans as the selective 
marker as described in EP 238 023. Transformants were grown in 
YPD medium (Sherman et al. f Methods in Yeast Genetics, Cold 
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Spring Harbor Laboratory, 1981) for 3-4 days and analysed for 
new protein species in the supernatant by sodium dodecyl 
sulphate polyacrylamide gel electrophoresis. The CBH 2 from fL. 
insolens formed a band with an apparent Mw of 65 kD indicating 
5 a substantial glycosylation of the protein chain, which is 
calculated to have a Mw of 51 kD on the basis of the amino acid 
composition. The intact enzyme binds well to cellulose, while 
enzymatic degradation products of 55 kD and 40 kD do not bind, 
indicating removal of the A-region and possibly the B-region. 
10 The enzyme has some activity towards filter paper, giving rise 
to release of glucose. As expected, it has very limited 
endoglucanase activity as measured on soluble cellulose in the 
form of carboxy methyl cellulose. 

15 Example 3 

Isolation of Fusarium oxvsporum genomic DNA 

A freeze-dried culture of Fusarium oxvsporum was reconstituted 
20 with phosphate buffer, spotted 5 times on each of 5 FOX medium 
plates (6% yeast extract, 1.5% K 2 HP0 4 , 0.75% MgS0 4 7H 2 0, 22.5% 
glucose, 1.5% agar, pH 5.6) and incubated at 37 °C. After 6 days 
of incubation the colonies were scraped from the plates into 15 
ml of 0.001% Tween-80 which resulted in a thick and cloudy 
25 suspension. 

Four 1-liter flasks, each containing 300 ml of liquid FOX 
medium, were inoculated with 2 ml of the spore suspension and 
were incubated at 30 °C and 240 rpm. On the 4 th day of 

30 incubation, the cultures were filtered through 4 layers of 
sterile gauze and washed with sterile water. The mycelia were 
dried on Whatman filter paper, frozen in liquid nitrogen, 
ground into a fine powder in a cold mortar and added to 75 ml 
of fresh lysis buffer (10 mM Tris-Cl 7.4, 1% SDS, 50 mM EDTA, 

35 100 /il DEPC) . The thoroughly mixed suspension was incubated in 
a 65 °C waterbath for 1 hour and then spun for 10 minutes at 
4000 rpm and 5°C in a bench-top centrifuge. The supernatant 
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was decanted and EtOH precipitated. After 1 hour on ice the 
solution was spun at 19,000 rpm for 20 minutes. The supernatant 
was decanted and isopropanol precipitated. Following 
centrifugation at 10,000 rpm for 10 minutes, the supernatant 
5 was decanted and the pellets allowed to dry. 

One milliliter of TER solution (10 mM Tris-HCl, pH 7.4, 1 mM 
EDTA, 100 fig RNAse A) was added to each tube, and the tubes 
were stored at 4°C for two days. The tubes were pooled and 

10 placed in a 65 °C waterbath for 30 minutes to suspend non- 
dissolved DNA. The solution was extracted twice with 
phenol/ CHC1 3 / isoamyl alcohol, twice with CHCl 3 /isoamyl alcohol 
and then ethanol precipitated. The pellet was allowed to settle 
and the EtOH was removed. 70% EtOH was added and the DNA stored 

15 overnight at -20 °C. After decanting and drying, 1 ml of TER 
was added and the DNA was dissolved by incubating the tubes at 
65 °C for 1 hour. The preparation yielded 1.5 mg of genomic DNA. 

Amplification, cloning and sequencing of DNA amplified with 
20 degenerate primers 



To amplify DNA from C-family (according to the nomenclature of 
Henrissat et al. Gene 81 (1), 1989, pp. 83-96) cellulases using 
PCR (cf. US 4,683,195 and US 4,683,202) each "sense" 
25 oligonucleotide was used in combination with each "antisense" 
oligonucleotide. Thus, the following primer pair was used: 



Primer 1 Primer 2 

ZC3220 2C3221 

30 

ZC3220 ; GCC AAC TAC GGT ACC GG(A/C/G/T) TA(C/T) TG(C/T) 

GA(C/T) (A/G/T) (C/G) (A/G/C/T) CA(G/A) TG 

ZC3221 ; GCG TTG GCC TCT AGA AT(G/A) TCC AT(C/T) TC (A/G/C/T) 
35 (C/G/T) (A/T) (G/A) CA(G/A) CA 
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In the PCR reaction, 1 /zg of Fusarium oxvsporum genomic DNA was 
used as the template. Ten times PCR buffer is lOOroM Tris-HCl pH 
8.3, 500 mM KC1, 15 mM MgCl, 0.1% gelatin (Perkin-Elmer Cetus) . 
The reactions contained the following ingredients: 



dH20 


35.75 


m 




10X PCR buffer 


5 


Ml 




template DNA 


5 


Ml 




primer 1 


2 


Ml 


(40pmol) 


primer 2 


2 


Ml 


(40 pmol) 


Tag polymerase 


0.25 


Ml 


(1.25 U) 


total 




50 


Ml 



The PCR reactions were performed for 40 cycles under the 
15 following conditions: 

94°C 1.5 min 

45° 2.0 min 

72° 2.0 min 



20 Five microliters of each reaction was analyzed by agarose gel 
electrophoresis. The sizes of the DNA fragments were estimated 
from DNA molecular weight markers. The reacton primed with 
ZC3220 and ZC3221, produced two DNA fragments of appropriate 
size to be candidates for fragments of C~family cellulases. The 

25 agarose sections containing these two fragments were excised, 
and the DNA was electroeluted and digested with the restriction 
enzymes Kpnl and Zbal. The fragments were ligated into the 
vector pUC18 which had been cut with the same two restriction 
enzymes. The ligations were transformed into E. coli and mini- 

30 prep DNA was prepared from the resulting colonies. The DNA 
sequences of these inserts were determined and revealed that 
two new C-family cellulases had been identified, one a new 
cellobiohydrolase and the other a new endoglucanase . 
The PCR cloning strategy described above for the C-family 

35 cellulases was applied using other primers which encoded 
conserved cellulase sequences within the known F-family 
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cellulases (cf. Henrissat et al., og. cit . ) The following 
primer pair was used for amplification of Fusarium genomic DNA. 

Primer 1 Primer 2 

5 ZC3226 ZC3227 

ZC3226 : TCC TGA CGC CAA GCT TT(A/G/T) (C/T) (A/T) (A/T) 

(A/C/T)AA (C/T)GA (C/T)TA (C/T)AA 

10 ZC3227 : CAC CGG CAC CAT CGA T(G/A/)T C(A/C/G/T)A 

(G/A) (C/T)T C(A/G/C/T)G T(A/G/T) A T 

The PGR reactions were performed for 40 cycles as follows: 

15 94°C 1.5 min 

50 °C 2.0 min 

72°C 2.0 min 

The 180 bp band was eluted from an agarose gel fragment, 
20 digested with the restriction enzymes Hind III and Cla I and 
ligated into pUC19 which had been digested with Hind III and 
Accl. The ligated DNA was transformed into E. coli and mini- 
prep DNA was prepared from colony isolates. The DNA sequence of 
the cloned DNA was determined. This fragment encoded sequences 
25 corresponding to a new member of the F-family cellulases. 

Construction of a Fusarium oxvsporum cDNA library 

Fusarium oxvsporum was grown by fermentation and samples were 
30 withdrawn at various times for RNA extraction and cellulase 
activity analysis. The activity analysis included an assay for 
total cellulase activity as well as one for colour 
clarification. Fusarium oxysporum samples demonstrating maximal 
colour clarification were extracted for total RNA from which 
35 poly(A)+RNA was isolated. 
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To construct a Fusarium oxysporum cDNA library, first-strand 
cDNA was synthesized in two reactions, one with and the other 
without radiolabeled dATP. A 2.5X reaction mixture was 
prepared at room temperature by mixing the following reagents 
5 in the following order: 10 fil of 5X reverse transcriptase 
buffer (Gibco-BRL, Gaithersburg, Maryland) 2.5 fil 200 mM 
dithiothreitol (made fresh or from a stock solution stored at - 
70 °C), and 2.5 fil of a mixture containing 10 mM of each 
deoxynucleotide triphosphate, (dATP, dGTP, dTTP and 5-methyl 

10 dCTP, obtained from Pharmacia LKB Biotechnology, Alameda, CA) . 
The reaction mixture was divided into each of two tubes of 7.5 
fil. 1.3 fil of 10 fiCi/fil 32p a-dATP (Amersham, Arlington 
Heights, IL) was added to one tube and 1.3 fil of water to the 
other. Seven microliters of each mixture was transferred to 

15 final reaction tubes. In a separate tube, 5 fig of Fusarium 
oxvsporum poly (A) + RNA in 14 /tl of 5 mM Tris-HCl pH 7.4, .50 fM 
EDTA was mixed with 2 /il of 1 fig/pl first strand primer (ZC2938 
GACAGAGCACAGAATTCACTAGTGAGCTCT 15 ) . The RNA-primer mixture was 
heated at 65 °C for 4 minutes, chilled in ice water, and 

20 centrifuged briefly in a microfuge. Eight microliters of the 
RNA-primer mixture was added to the final reaction tubes. Five 
microliters of 200 U//xl Superscript™ reverse transcriptase 
(Gibco-BRL) was added to each tube. After gentle agitation, the 
tubes were incubated at 45 °C for 30 minutes. Eighty microliters 

25 of 10 mM Tris-HCl pH 7.4, 1 mM EDTA was added to each tube, the 
samples were vortexed, and briefly centrifuged. Three 
microliters was removed from each tube to determine counts 
incorporated by TCA precipitation and the total counts in the 
reaction. A 2 fil sample from each tube was analyzed by gel 

30 electrophoresis. The remainder of each sample was ethanol 
precipitated in the presence of oyster glycogen. The nucleic 
acids were pelleted by centrifugation, and the pellets were 
washed with 80% ethanol. Following the ethanol wash, the 
samples were air dried for 10 minutes. The first strand 

35 synthesis yielded 1.6 fig of Fusarium oxysporum cDNA, a 33% 
conversion of poly(A)+RNA into DNA. 
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Second strand cDNA synthesis was performed on the RNA-DNA 
hybrid from the first strand reactions under conditions which 
encouraged first strand priming of second strand synthesis 
resulting in hairpin DNA. The first strand products from each 
5 of the two first strand reactions were resuspended in 71 fil of 
water. The following reagents were added, at room temperature, 
to the reaction tubes: 20 fil of 5X second strand buffer (100 mM 
Tris pH 7.4, 450 mM KC1, 23 mM MgCl 2 , and 50 mM (NH 4 ) 2 (S0 4 ) , 3 
fil of 5 mM 0-NAD, and pi of a deoxynucleotide triphosphate 

10 mixture with each at 10 mM. One microliter of a- 32 P dATP was 
added to the reaction mixture which received unlabeled dATP for 
the first strand synthesis while the tube which received 
labeled dATP for first strand synthesis received 1 fil of water. 
Each tube then received 0.6 /tl of 7 U/7-tl E. coli DNA ligase 

15 (Boehringer-Mannheim, Indianapolis, IN), 3.1 /il of 8 U/a*1 E. 
coli DNA polymerase I (Amersham) , and 1 fil 2 U/jtl of RNase H 
(Gibco-BRL) . The reactions were incubated at 16 B C for 2 hours. 
After incubation, 2/il from each reaction was used to determine 
TCA precipitable counts and total counts in the reaction, and 

20 2 fil from each reaction was analyzed by gel electrophoresis. To 
the remainder of each sample, 2 fil of 2.5 pg/fil oyster 
glycogen, 5 /tl of 0.5 EDTA and 200 fil of 10 mM Tris-HCl pH 7.4, 
1 mM EDTA were added. The samples were phenol-chloroform 
extracted and isopropanol precipitated. After centrifugation 

25 the pellets were washed with 100 fil of 80% ethanol and air 
dried. The yield of double stranded cDNA in each of the 
reactions was approximately 2.5 fig. 

Mung bean nuclease treatment was used to clip the single- 
30 stranded DNA of the hair-pin. Each cDNA pellet was resuspended 
in 15 pi of water and 2.5 fil of 10X mung bean buffer (0.3 M 
NaAc pH 4.6, 3 M NaCl, and 10 mM ZnS0 4 ), 2.5 fil of 10 mM DTT, 
2.5 fil of 50% glycerol, and 2.5 fil of 10 U/fil mung bean 
nuclease (New England Biolabs, Beverly, MA) were added to each 
35 tube. The reactions were incubated at 30 °C for 30 minutes and 
75 Ail of 10 mM Tris-HCl pH 7.4 and 1 mM EDTA was added to each 
tube. Two-mi crol iter aliquots were analyzed by alkaline agarose 
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gel analysis. One hundred microliters of 1 M Tris-HCl pH 7.4 
was added to each tube and the samples were phenol -chloroform 
extracted twice. The DNA was isopropanol precipitated and 
pelleted by centrifugation. After centrifugation, the DNA 
5 pellet was washed with 80% ethanol and air dried. The yield was 
approximately 2 pg of DNA from each of the two reactions. 

The cDNA ends were blunted by treatment with T4 DNA polymerase. 
DNA from the two samples were combined after resuspension in a 

10 total volume of 24 pi of water. Four microliters of 10X T4 
buffer (330 mM Tris-acetate pH 7.9, 670 mM KAc, 100 mM MgAc, 
and 1 mg/ml gelatin) , 4 pi of 1 mM dNTP, 4 pi 50 mM DTT, and 4 
pi of 1 U/pl T4 DNA polymerase (Boehringer-Mannheim) were added 
to the DNA. The samples were incubated at 15 °C for 1 hour. 

15 After incubation, 160 pi of 10 mM Tris-HCl pH 7,4, 1 mM EDTA 
was added, and the sample was phenol-chloroform extracted. The 
DNA was isopropanol precipitated and pelleted by 
centrifugation. After centrifugation the DNA was washed with 
80% ethanol and air dried. 

20 

After resuspension of the DNA in 6.5 pi water, Eco RI adapters 
were added to the blunted DNA. One microliter of 1 pg/pl Eco RI 
adapter (Invitrogen, San Diego, CA Cat. # N409-20) , 1 pi of 10X 
ligase buffer (0.5 M Tris pH 7.8 and 50 mM MgCl 2 ) , 0.5 pi of 10 
25 mM ATP, 0.5 pi of 100 mM DTT, and 1 pi of 1 U/pl T4 DNA ligase 
(Boehringer-Mannheim) were added to the DNA. After the sample 
was incubated overnight at room temperature, the ligase was 
heat denatured at 65 °C for 15 minutes. 

30 The Sst I cloning site encoded by the first strand primer was 
exposed by digestion with Sst I endonuclease. Thirty-three 
microliters of water, 5 pi of 10X Sst I buffer (0.5 M Tris pH 
8.0, 0.1 M MgCl 2 , and 0.5 M NaCl) , and 2 pi of 5 U/pl Sst I 
were added to the DNA, and the samples were incubated at 37 °C 

35 for 2 hours. One hundred and fifty microliters of 10 mM Tris- 
HCl pH 7.4, 1 mM EDTA was added, the sample was phenol- 
chloroform extracted, and the DNA was isopropanol precipitated. 
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The cDNA was chroma tographed on a Sepharose CL 2B (Pharmacia 
LKB Biotechnology) column to size-select the cDNA and to remove 
free adapters. A 1.1 ml column of Sepharose CL 2B was poured 
into a 1 ml plastic disposable pipet and the column was washed 
5 with 50 column volumes of buffer (10 mM Tris pH 7.4 and 1 mM 
EDTA) . The sample was applied , one-drop fractions were 
collected, and the DNA in the void volume was pooled. The 
fractionated DNA was isopropanol precipitated. After 
centrifugation the DNA was washed with 80% ethanol and air 
10 dried. 

A Fusarium oxvsporum cDNA library was established by ligating 
the cDNA to the vector pYcDE8 1 (cf. WO 90/10698) which had been 
digested with Eco RI and Sst I. Three hundred and ninety 

15 nanograms of vector was ligated to 400 ng of cDNA in a 80 /il 
ligation reaction containing 8 fil of 10 X ligase buffer, 4 fil 
of 10 mM ATP, 4 fil 200 mM DTT, and 1 unit of T4 DNA ligase 
(Boehringer-Mannheim. After overnight incubation at room 
temperature, 5 /tg of oyster glycogen and 120 fil of 10 mM Tris- 

20 HC1 and 1 mM EDTA were added and the sample was phenol- 
chloroform extracted. The DNA was ethanol precipitated, 
centrifuged, and the DNA pellet washed with 80% ethanol. After 
air drying, the DNA was resuspended in 3 /il of water. Thirty 
seven microliters of electroporation competent DH10B cells 

25 (Gibco-BRL) was added to the DNA, and electroporation was 
completed with a Bio-Rad Gene Pulser (Model #1652076) and Bio- 
Rad Pulse Controller (Model #1652098) electroporation unit 
(Bio-Rad Laboratories, Richmond, CA) . Four milliliters of SOC 
(Hanahan, J. Mol. Biol. 166 (1983), 557-580) was added to the 

30 electroporated cells, and 400 fil of the cell suspension was 
spread on each of ten 150 mm LB amipicillin plates. After an 
overnight incubation, 10 ml of LB amp media was added to each 
plate, and the cells were scraped into the media. Clycerol 
stocks and plasmid preparations were made from each plate. The 

35 library background (vector without insert) was established at 
aproximately 1% by ligating the vector without insert and 
titering the number of clones after electroporation. 
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Screening the cDNA library 

Full length cellulase cDNA clones were isolated from the 
Fusarium oxysporuin cDNA library by hybridization to PCR 
5 generated genomic oligonucleotide probes. 

The PCR-generated oligonucleotides: ZC3309, a 40-mer coding for 
part of the C family cellobiohydrolase, ATT ACC AAC ACC AGC GTT 
GAC ATC ACT GTC AGA GGG CTT C; ZC3310, a 28-mer coding for the 

10 C family endoglucanase, AAC TCC GTT GAT GAA AGG AGT GAC GTA G; 
and ZC3311, a 40-mer coding for the F family cellulase, CGG AGA 
GCA GCA GGA ACA CCA GAG GCA GGG TTC CAG CCA C, were end labeled 
with T 4 polynucleotide kinase and 32 ~ p gamma ATP, For the 
kinase reaction 17 picomoles of each oligonucleotide were 

15 brought up to 12.5 fil volume with deionized water. To these 
were added 2 /il 10 X kinase buffer (1 X: 10 mM magnesium 
chloride, 0.1 mM EDTA, 50 mM Tris pH 7.8), 0.5 /xl 200 mM 
dithiothreitol , 1 /xl 32 P gamma ATP 150 mCi/ml, Amersham) , 2 fil 
T 4 polynucleotide kinase (10 V/fil BRL) . The samples were then 

20 mixed and incubated at 37 °C for 30 minutes. Oligonucleotides 
were separated from unincorporated nucleotides by precipitation 
with 180 /xl TE (10 mM tris pH 8. , 1 mM EDTA), 100 /xl 7.5 M 
ammonium acetate, 2 /xl mussel glycogen (20 mg/ml, Gibco-BRL) 
and 750 /xl 100% ethanol. Pellets were dissolved in 200 /xl 

25 distilled water. To determine the amount of radioactivity 
incorporated in the oligonucleotides, 10 fil of 1:1000 dilutions 
of oligonucleotides were read without scintillation fluid in a 
Beckman LS 1800 Liquid Scintillation System. Activities were: 
115 million cpm for ZC3309, 86 million cpm for ZC3310, and 79 

30 million cpm for ZC3311. 

Initially, a library of 20,000 cDNA clones was probed with a 
mixture of each of the three oligonucleotides corresponding to 
the C family cellobiohydrolase, C family endoglucanase and F 
35 family cellulase clones. The cDNA library was plated out from 
titered glycerol stocks stored at -70 °C. Four thousand clones 
were plated out on each of five 150 mm LB ampicillin (1000 
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fig/ml) plates. Lifts were taken in duplicate following standard 
methodology Sambrook et al., Molecular Cloning , 1989) using 
Biotrans 0.2 /im 137 mm filters. The filters were baked at 80 D C 
in vacuum for 2 hours, then swirled overnight in a 
5 crystallizing dish (Pharmacia LKB Biotechnology, Alameda, CA) 
at 37°C in 80 ml prehybridization solution (5 X Denhardt ! s (IX: 
0.02% Ficoll, 0.02% polyvinylpyrrolidone, 0.02% bovine serum 
albumen Pentax Fraction 5 (Sigma, St. Louis, MO) ) 5 X SSC (1 X: 
0.15 M sodium chloride, 0.15 M sodium citrate pH 7.3)), 100 
10 jig/ml denatured sonicated salmon sperm DNA, 50 mM sodium 
phosphate pH 6.8, 1 mM sodium pyrophosphate, 100 /xM ATP, 20% 
formamide, 1% sodium dodecyl sulfate) (Ulrich et al. EMBO J. 3 
(1984), 361-364). 

15 Prehybridized filters were probed by adding them one at a time 
into a crystallizing dish with 80 ml prehybridization solution 
with 80 million cpm ZC3309, 86 million cmp ZC3310 and 79 
million cpm ZC3311 and then swirled overnight at 37 °C. Filters 
were then washed to high stringency. The probed filters were 

20 washed with three 400 ml volumes of low stringency wash 
solution (2 X SSC, 0.1% SDS) at room temperature in the 
crystallizing dish, then with four 1-liter volumes in a plastic 
box. A further wash for 20 minutes at 68 'C with 
tetramethyl ammonium chloride wash solution (TMACL: 3 M 

25 tetramethyl ammonium chloride, 50 mM Tris-HCl pH 8.0, 2 mM EDTA, 
1 g/1 SDS) (Wood et al., Proc. Natl. Acad. Sci. 82 (1985)) 
provided a high stringency wash for the 28-mer ZC3310 
independent of its base composition 1585-1588) . The filters 
were then blotted dry, mounted on Whatman 3 MM paper and covered 

30 with plastic wrap for autoradiography. They were exposed 
overnight at -70 °C with intensifying screens and Kodak XAR-5 
film. 

Two putative positives appeared on duplicate filters. The 
35 corresponding areas on the plates with colonies were picked 
into 1 ml of IX polymerase chain reaction (PCR) buffer (100 mM 
Tris HC1 pH 8.3, 500 mM KC1, 15 mM MgCl, 0.1% gelatin; Perkin 
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Elmer Cetus) and plated out at five tenfold dilutions on 100 mm 
LB plates with 70 jag/ml ampicillin. These plates were grown at 
37 °C overnight. Two dilutions of each putative clone were 
chosen for rescreening as outlined above. One isolated clone, 
5 pZFH196 was found. This was grown up overnight in 10 ml 2X YT 
broth (per liter: 16 g bacto-tryptone , 10 g bacto-yeast 
extract, 10 g NaCl) . Twenty three micrograms of DNA were 
purfified by the rapid boiling method (Holmes and Quigley, 
Anal. Biochem. 114 (1981), 193-197). From restriction analysis 
10 the clone was found to be approximately 2,000 base pairs in 
length. Sequence analysis showed it to contain a fragment 
homologous to the C family cellobiohydrolase fragment cloned by 
PCR. 

15 In an attempt to isolate additional cellulase cDNA clones, a 
cDNA library of 2 million clones was plated out on 20 150 mm LB 
plates (100 fig/ml ampicillin) containing approximately 100,000 
cDNA clones. Lifts were taken in duplicate as in the first 
screening attempt. This library was screened with 

20 oligonucleotides corresponding to the three cellulase species 
as described above except that the hybridization was carried 
out with formamide in the prehybridization buffer and at a 
temperature of 30 *C. Washing with TMACL was carried out twice 
for 20 minutes at 67 °C. Between 8 and 20 signals were found on 

25 duplicate filters of each of the 20 plates. Fifteen plugs were 
taken from the first plate with the large end of a pasteur 
pipet into 1 ml 1 X PCR buffer (Perkin-Elmer Cetus) . PCR was 
carried out on the bacterial plugs with three separate 
oligonucleotide mixtures. Each mixture contained the vector 

30 specific oligonucleotide ZC2847 and additionally, a different 
cellulase specific oligonucleotide (ZC3309, ZC3310 or ZC3311) 
within each mixture. Amplitaq polymerase (Perkin-Elmer Cetus) 
was used with Pharmacia Ultrapure dNTP and following Perkin 
Elmer Cetus procedures. Sixteen picomoles of each primer were 

35 used in 40 fil reaction volumes. Twenty microliters of cells in 
1 X PCR buffer were added to 20 fil mastermix which contained 
everything needed for PCR except for DNA. After an initial 1 
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minute 45 second denaturation at 94 °C 28 cycles of: 45 seconds 
at 94 °C, 1 minute at 45 °C and 2 minutes at 72 °C with a final 
extension of 10 minutes at 72 °C were employed in a Perkin Elmer 
thermocycler. Ten of the 15 plugs yielded a band when primed 
5 with the C family specific oligonucleotide ZC3309 and ZC2847. 
The other mixtures gave no specific products. Five plugs which 
produced the largest bands by PCR, therefore possibly being 
full length C family cellobiohydrolases, along with the 5 plugs 
which did not produce PCR bands, were plated out at five 10 

10 fold dilutions onto 100 mm LB plates with 70 fig/ml ampicillin 
and grown overnight. Duplicate lifts were taken of two ten fold 
dilutions each. Prehydridization and hybridization were carried 
out as described above with a mixture of the 3 
oligonucleotides. Isolated clones were found on all 10 of the 

15 platings. These were picked from the dilution plates with a 
toothpick for single colony isolation on 100 mm LB plates with 
70 ptg/ml ampicillin. PCR was carried out on isolated bacterial 
colonies with 2 oligonucleotides specific for the C family 
cellobiohydrolase (ZC3409 (CCG TTC TGG ACG TAC AG A) and ZC3411 

20 (TGA TGT CAA GTT CAT CAA) ) . Conditions were identical to those 
described above except for using 10 picomoles of each primer in 
25 til reaction volumes. Colonies were added by toothpick into 
PCR tubes with 25 ptl mastermix before cycling. Five of the 10 
gave strong bands of the size expected for a C family 

25 cellobiohydrolase. Isolated colonies were then grown up in 20 
ml of Terrific Broth (Sambrook et al., op. cit. , A2) and DNA 
was isolated by the rapid boiling method. The clones were 
partially sequenced by Sanger dideoxy sequencing. From sequence 
analysis the 5 clones which did not give bands specific for a 

30 C family cellobiohydrolase by PCR were shown to be F family 
cellulase clones. 

In order to clone the C family endoglucanase , the cDNA library 
of 2 million clones was rescreened with only ZC3310. Conditions 
35 of prehydridization and hybridization were like those used 
above. Filters were hybridized for 10 hours at 30 °C with one 
million CPM endlabeled ZC3310 per ml prehybridization solution 
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without formamide. Washing with TMACL was carried out 2 times 
for 20 minutes at 60° C. Seven weak signals were found on 
duplicate filters. Plugs were picked with the large end of a 
pipet into 1 ml LB broth. These were each plated out in 5 10 
5 fold dilutions on 100 mm LB plates with 70 /ig/ml ampicillin. 
Duplicate lifts were taken of 2 dilutions each and were 
processed as described above. Prehybridization, hybridization, 
and washing were carried out as for the first level of 
screening. Three isolated clones were identified and streaked 

10 out for single colony hybridization. Isolates were grown 
overnight in 50 ml of Terrific Broth (per liter: 12 g tryptone, 
24 g yeast extract, 4 ml glycerol, autoclaved, and 100 ml of 
0.17 M KH 2 P0 4 , 0.72 M K 2 HP0 4 (Sambrook et al., o^ cit. , A2) 
and DNA was isolated by alkaline lysis and PEG precipitation by 

15 standard methods (Maniatis 1989, 1.38-1.41). From restriction 
analysis, one clone (pZFH223) was longer than the others and 
was chosen for complete sequencing. Sequence analysis showed it 
to contain the PCR fragment cloned initially. 

20 DNA sequence analysis 

The cDNAs were sequenced in the yeast expression vector 
pYCDE8 1 . The dideoxy chain termination method (F. Sanger et 
al., Proc. Natl. Acad. Sci. USA 74 , 1977, pp. 5463-5467) using 

25 @35-S dATP from New England Nuclear (cf. M.D. Biggin et al., 
Proc. Natl. Acad. Sci. USA 80, 1983, pp. 3963-3965) was used 
for all sequencing reactions. The reactions were catalysed by 
modified t7 DNA polymerase from Pharmacia (cf. S. Tabor and 
C.C. Richardson, Proc. Natl. Acad. Sci. USA 84, 1987, pp. 4767- 

30 4771) and were primed with an oligonucleotide complementary to 
the ADH1 promoter (2C996: ATT GTT CTC GTT CCC TTT CTT) , 
complementary to the CYC1 terminator (ZC3635: TGT ACG CAT GTA 
ACA TTA) or with oligonucleotides complementary to the DNA of 
interest. Double stranded templates were denatured with NaOH 

35 (E.Y. Chen and P.H. Seeburg, fiNA 4 , 1985, pp. 165-170) prior to 
hybridizing with a sequencing oligonucleotide. Oligonucleotides 
were synthesized on an Applied Biosystems Model 380A DNA 
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synthesizer. The oligonucleotides used for the sequencing 
reactions are listed in the sequencing oligonucleotide table 
below: 

5 c-familv cellobiohydrolase sequencing primers 
ZC3411 TGA TGT CAA GTT CAT CAA 
ZC3408 TCT GTA CGT CCA GAA CGG 
ZC3407 ATG ACT TCT CTA AGA AGG 
ZC3406 TCC AAC ATC AAG TTC GGT 
10 ZC3410 AGG CCA ACT CCA TCT GAA 

ZC3309 ATT ACC AAC ACC AGC GTT GAC ATC ACT GTC AGA GGG CTC 
C 

ZC3409 CCG TTC TGG ACG TAC AGA 

15 F-familv cellulase specific sequencing primers 
ZC3413 CCA TCG ACG GTA TTG GAT 

ZC3311 CGG AGA GCA GCA GGA ACA CCA GAG GCA GGG TTC CAG CCA 
C 

ZC3412 GAG GGT AGA GCG ATC GTT 

20 

C-family endoglucanase specific sequencing primers 
ZC3739 TGA TCT CAT CGA GCT GCA CC 
ZC3684 GTG ATG CTC AGT GCT ACG TC 
ZC3310 AAC TCC GTT GAT GAA AGG AGT GAC GTA G 
25 ZC3750 TCC AAT AGC TTC CCA GCA AG 
ZC3683 TGT CCC TTG ATG TTG CCA AC 

The DNA sequences of the full-length cDNA clones, as well as 
the derived amino acid sequences, are shown in the appended 
30 Figs. 11 (C-family cellobiohydrolase) , 12 (F-family cellulase) 
and 13 (C-family endoglucanase) . 
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Example 4 

Isolation of endoalucanase EGI gene from H. insolens 

The cDNA library described in example 1 was also screened with 
5 a 35 bp oligonucleotide probe in the antisense configuration 
with the sequence: 

NOR-770: 5' GCTTCGCCCATGCCTTGGGTGGCGCCGAGTTCCAT 3 f 

The sequence was derived from the amino acid sequence of an 
10 alcalase fragment of EGI purified from H. insolens . using our 
knowledge of codon bias in this organism. Complete clones of 
1.6 kb contained the entire coding sequence of 1.3 kb as shown 
in Fig. 14A-E. The probe sequence NOR-770 is located at Met 344 - 
Ala 355 . 

15 

Constructio n of expression plasmids of EGI (full length) and 
EGI' ( truncated) 

The EGI gene still containing the poly-A tail was inserted into 
20 an A. orvzae expression plasmid as outlined in Fig 2. The 
coding region of EGI was cut out from the Ncol-site in the 
initiating Met-codon to the Bam Hl-site downstream of the poly- 
A region as a 1450 bp fragment from pHW480. This was ligated to 
a 3.6 kb Ncol-Narl fragment from pSX224 (Fig. 1) containing the 
25 TAKA promoter and most of pUC19, and to a 960 bp Narl-BamHI 
fragment containing the remaining part with the AMG-terminator . 
The 960 bp fragment was taken from p960 which is equivalent to 
p777 (described in EP 238,023) except for the inserted gene. 
The resulting expression plasmid is termed pHW485. 

30 

The expression plasmid pHW704 with the full length EGI gene 
without poly A tail is shown in Fig. 3. From the BstEII site 
1300 bp downstream of the Ncol-site was inserted a 102 bp 
BstEII-BamHI linker (2645/2646) ligated to Bglll-site in the 
35 vector. The linker contains the coding region downstream of 
BstEII-site with 2 stop codons at the end and a Pvul-site near 
the C-terminal to be used for addition of CBD and B-regions. 



SUBSTITUTE SHEET 



WO 91/17244 



CT/DK91/00124 



32 

Expression plasmid pHW697 with the truncated EGI 1 gene was 
constructed similarly using a BstEII-BamHI linker (2492/2493) 
of 69 bp. In this linker we introduced a Pstl-site altering 
Val 421 to Leu 421 and the last 13 amino acids of the coding 
5 region: K 423 PKPKPGHGPRSD 435 were eliminated. The short tail 
with the rather unusual sequence was cut off to give EGI 1 a C- 
terminal corresponding to the one found in T. reesei EGI just 
upstream of the A and B-region. 

10 

Construction of an expression plasmid of EGI 1 with CBD and B 
region from a ~ 43 kD endoalucanase added C-terminally 

The - 43 kD endoglucanase of H. isolens described in DK patent 
15 application No. 736/91 has shown good washing performance. 
Besides the catalytic domain, 43 kD cellulase has a Oterminal 
CBD and B region which has been transferred to EGI 1 which does 
not have any CBD or B region itself. The construction was done 
in 2 steps, as outlined in Fig. 4. The Pstl-HincII linker 
20 (028/030 M) intended to connect the C-terminal of EGI 1 to the 
B-region of 43 kD cellulase, was subcloned in pUC19 Pstl-EcoRI 
with C-terminal Hinc2-EcoRI 100 bp fragment from 43 kD 
cellulase gene in pSX320 (Fig 5; as described in DK 736/91). 
From the subclone pHW767 the CBD and B-region was cut out as a 
25 250 bp Pstl-Bglll fragment and ligated to pHW485 (Fig. 2) 
BstEII-Bglll fragment of 5.7 kb and to the remaining BstEII- 
Pstl fragment of 55 bp from pHW697 (Fig. 3). The resulting 
expression plasmid pHW768 has the - 43 kD endoglucanase CBD and 
B region added to Gln 422 of EGI 1 . 

30 

Construction of an expression plasmid of EGI with the CBD and 
B region from - 43 kD endoglucanase added C-terminal Iv 

35 This plasmid was constructed in a similar way as pHW768 except 
that, in this case, the C-terminal linker yielded the complete 
sequence of EGI. Fig. 6 shows the procedure in 3 steps. The 
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PvuI-HincII linker (040 M/041 M) was subcloned in pUC18 to give 
pHW775, into which a HincII-EcoRI 1000 bp fragment from pSX 320 
(Fig- 5) was inserted to give pHW776. From this the CBD and B 
region was cut out as a 250 bp Pvul-Bglll fragment and ligated 
5 to 5.7 kb BstEII-Bglll fragment from pHW485 (Fig. 2) and 90 bp 
BstEII-Pvul fragment from pHW704 (Fig. 3). The resulting 
expression plasmid pHW777 contains the - 43 kD endoglucanase 
CBD and B region added to Asp 4 3 5 in the complete EGI sequence. 



Expression in A. orvzae of EGI and EGI 1 with and without the 
CBD and B region from - 4 3 kD endoglucanase 

The expression plasmids pHW485, pHW704, pHW697, pHW768 and 
15 pHW777 were transformed into A. orvzae IFO 4177 as described in 
example 2. Supernatants from transf ormants grown in YPD medium 
as described were analyzed by SDS-PAGE, where the native EGI 
has an apparent Mw of 53 kD. EGI 1 looks slightly smaller as 
expected, and the species with the added CBD and B region are 
20 increased in molecular weight corresponding to the size of the 
CBD and B region with some carbohydrate added. A polyclonal 
antibody AS169 raised against the - 43 kD endoglucanase 
recognizes EGI and EGI" only when the - 4 3 kD CBD and B region 
are added, while all 4 species are recognized by a polyclonal 
25 antibody AS78 raised against a cellulase preparation from H. 
insolens . All 4 species have endoglucanase activity as measured 
on soluble cellulose in the form of carboxy methyl cellulose. 



10 



Linkers 



30 



2492/2493: 



BstE2-Pstl-BamHl 



5' 

3 • 



GTCACCTACACCAACCTCCGCTGGGGCGAG 
GATGTGGTTGGAGGCGACCCCGCTC 



35 



ATCGGCTCGACCTACCAGGAGCTGCAGTAGTAA 
TAGCCGAGCTGGATGGTCCTCGACGTCATCATT 



40 



TGATAG 
ACTATCCTAG 



3' 
5' 



69 bp 
68 bp 
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2645/2646 : BstE2-Xmal-PvuI-BamHl 

5 ' GTCACCTACACCAACCTCCGCTGGGGCGAGATCGGC 
3 • GATGTGGTTGGAGGCGACCCCGCTCTAGCCG 

5 

TCGACCTACCAGGAGGTTCAGAAGCCTAAGCCCAAG 
AGCTGGATGGTCCTCCAAGTCTTCGGATTCGGGTTC 

CCCGGGCACGGCCCCCGATCGGACTAATAG 3 ' 

10 GGGCCCGTGCCGGGGGCTAGCCTGATTATCCTAG 5 1 

028 M/030 M : Pstl-HincII 

15 5' GTCCAGCAGCACCAGCTCTCCGGTC 3 1 

3 1 ACGTCAGGTCGTCGTGGTCGAGAGGCCAG 5 ' 

040 M/041 M : PvuI-HincII 

20 

5 1 CGTCCAGCAGCACCAGCTCTCCGGTC 3 1 

3 1 TAGCAGGTCGTCGTGGTCGAGAGGCCAG 5 1 

25 Example 5 

- 43 kD endoglucanase with different CBDs and B-regions: 

In order to test the influence on the - 43 kD endoglucanase of 
30 the different CBDs and B regions from the A region clones we 
have substituted the original CBD and B region from - 43 kD 
with the other C-terminal CBDs and B regions, i.e. A-l, A-8, A- 
9, A-ll, and A-19 (cf. Example 1). In order to test the 
concept we have also made a construction where the 43 kD B 
35 region has been deleted. 

Fragments : 

40 All fragments were made by PCR amplification using a Perkin- 
Elmer/Cetus DNA Amplification System following the 
manufacturers instructions. 



102 bp 
101 bp 



25 bp 
29 bp 



2 6 bp 
28 bp 
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1) a PCR fragment was made which covers the DNA from 56 
bp upstream of the Bam HI site on pSX 320 (Fig. 5) to 717 bp 
within the coding region of the -43 kD endoglucanase gene and 
at the same time introduces a Kpn I site at pos. 708 and a Sma 

5 I site at pos. 702 in the coding region which is at the very 
beginning of the B region. This PCR fragment was made with the 
primers NOR 1542 and NOR 3010 (see list of oligonucleotides 
below) . 

10 

2) A PCR fragment was made which includes the CBD and B 
region of A-l introducing a Kpn I site at the very beginning of 
the B region in frame with the Kpn I site introduced in 1) and 
introducing a Xho I site downstream of the coding region of the 

15 gene. Primers used: NOR 3012 upstream and NOR 3011 downstream. 

3) As 2) except that the fragment covered the CBD and B 
region of A-8 and the Xho I site in the expression vector 
downstream of gene. Primers: NOR 3017 and NOR 2516. 

20 

4) As 2) but with primers NOR 3016 and NOR 3015 
covering the CBD and B region from A-9. 

5) as 3) but with primers NOR 3021 and NOR 2516 covering 
25 the CBD and B region from A-ll. 

6) As 2) but with primers NOR 3032 and NOR 3022 covering 
the CBD and B region from A-19. 

30 7) A PCR fragment which includes the CBD from - 43 kD 

endoglucanase and the Xho I site downstream from the gene on 
pSX 320 introducing a Pvu II site at the very end of the B 
region. 

Primers: NOR3023 and NOR2516. 

35 

Combinations: 
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1) +2) inserted as Bam HI - Kpn I and Kpn I - Xho I into pToC 
68 (described in DK736/91) Bam HI - Xho I, thus coding for the 
43 kD core enzyme with the CBD and B region from A-l. 

5 1) + 3): Like above giving a 43 kD enzyme with the A-8 CBD/B 
region. 

1) + 4) : As above, but with the A-9 CBD and B region. 
10 1) + 5) : As above, but with the A-ll CBD and B region. 



1) + 6) : As above, but with the A-19 CBD and B region. 

15 1) +7) inserted as Bam HI - Sma I and Pvu II - Xho I into pToC 
68 Bam HI - Xho I, thus coding for the 43 kD enzyme without the 
B region. 



Oligonucleotides : 

20 

NOR 1542: 5' - CGACAACATCACATCAAGCTCTCC - 3' 

NOR 2516: 5' - CCATCCTTTAACTATAGCGA - 3 1 

25 NOR 3010: 5' - GCTGGTGCT GGTACCCGGGA TCTGGACGGCAGGG - 3' 

Kpn Sma 



30 



35 



NOR 3011: 5' - GCATCGGTACCGGCGGCGGCTCCACTGGCG - 3' 

Kpn 

NOR 3012: 5' - CTCACTCCAT CTCGAGT CTTTCAATTTACA - 3» 

Xho 

NOR 3015: 5 1 - CTTTTCTCGAGTCCCTTAGTTCAAGCACTGC - 3 1 

Xho 



NOR 3016: 5 1 - TGACCGGTACCGGCGGCGGCAACACCAACC 

Kpn 



- 7 ■ 



40 NOR 3017: 5 1 - TCACCGGTACCGGCGGTGGAAGCAACAATG - 3 1 

Kpn 



45 



NOR 3021: 5' - TCTTCGGTACCAGCGGCAACAGCGGCGGCG - 3« 

Kpn 
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NOR 3022: 5 1 - CGCTGGGTACCAACAACAATCCTCAGCAGG -3 1 

Kpn 

NOR 3023: 5' - CTCCCAGCAGCTGCACTGCTGAGAGGTGGG - 3 1 
5 Pvu II 

NOR 3032: 5' - CGG CCTCGAGA CCTTACAGGCACTGCGAGT - 3" 

Xho 

10 

Example 6 

Fusion of a bacterial catalytic domain to a fungal CBD 

15 The endoglucanase Endo 1 produced by Bacillus lautus NCIMB 
40250 (described in PCT/DK9 1/00013) consists of a catalytic 
domain (core) (Ala(32) - Val(555)) and a C terminal cellulose 
binding domain (CBD) (Gln556 - Pro700) homologous to the CBD of 
a jL. subtil is endoglucanase (R.M. MacKay et al. 1986. Nucleic 

20 Acids Res. 2A, 9159-70). The CBD is proteolytically cleaved off 
when the enzyme is expressed in iL. subtilis or E^ coli 
generating a CMC degrading core enzyme. In this example this 
core protein was fused with the B region and CBD of the - 43 kD 
endoglucanase from Humicola insolens (described in DK 736/91) . 

25 

Construction of the fusion. 

The plasmid pCaHj 170 containing the cDNA gene encoding the ~ 
43 kD endoglucanase was constructed as shown in Fig. 7. pCaHj 

30 170 was digested with Xho II and Sal I. The 223 bp Xho II - Sal 
I fragment was isolated and ligated into pUC 19 (Yanisch-Perron 
et al. 1985. Gene 33, 103-119) digested with BamH I and Sal I. 
The BamH I site was regenerated by this Xho II-BamH I ligation. 
The resulting plasmid, IM 2, was digested with Eco Rl and BamH 

35 I and ligated with the linker NOR 3045 - NOR 3046: 

NOR 3045 5' AATTCCGCGGAACGATATCTCCGA 3 1 

NOR 3046 3' GGCGCCTTGCTATAGAGGCTCTAG 5" 

EcoR I EcoR V Mbo I 

40 Sac II 
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The resulting plasmid, IM 3, was digested with EcoR V and SacII 
and ligated to the 445 bp Hinc II - Sac II pPL 517 fragment. 
pPL 517 contains the entire Bacillus Endo 1 gene 
(PCT/DK9 1/00013) . The product of this ligation was termed IM 4, 
5 In order to replace the Bacillus signal peptide of Endo 1 with 
the fungal signal peptide from the 43 kdal endoglucanase four 
PCR primers were designed for "Splicing by Overlap Extension" 
(SOE) fusion (R M Horton et al. (1989) :Gene, 77, 61-68). The 43 
kD signal sequence was amplified from the plasmid pCaHj 109 (DK 

10 736/91) introducing a Bel I site in the 5' end and a 21 bp 
homology to the Bacillus endo 1 gene in the 3 1 end using the 5 1 
primer NOR 3270 and the 3* primer NOR 3275. The part of the 
Endo I gene 5 1 to the unique Sac II site was amplified using 
the 5 1 primer NOR 3276 introducing a 21 bp homology to the 43 

15 kdal gene and the 3' primer NOR 3271 covering the Sac II site. 
The two PCR framents were mixed, melted, annealed and filled up 
with the taq polymerase (Fig. 9) . The resulting hybrid was 
amplified using the primers NOR 3270 and NOR 3271. The hybrid 
fragment was digested with Bel 1 and SacII and ligated to the 

20 676 bp Sac II - Sal I fragment from IM 4 and the Aspergillus 
expression vector pToc 68 (DK 736/91) digested with BamH I. The 
product of this ligation, pCaHj 180 (Fig. 10) , contained an 
open reading frame encoding the 43 kD signal peptide and the 
first four N terminal aminoacids of the mature - 43 kD 

25 endoglucanase (Met (1) -Arg(25) fused to the core of Endo 1 
(Ser(34)-Val(549) ) followed by the peptide Ile-Ser-Glu (encoded 
by the linker) fused to the 43 kD B region and CBD (Ile(233)- 
Leu(285) . pCaHj 180 was used to transform Aspergillus oryzae 
IFO 4177 using selection on acetamide by cotransf ormation with 

30 pToC 90 (cf. DK 736/91) as described in published EP patent 
application No. 238 023. 



NOR 3270 5' TTGAATTCTGATCAAGATGCGTTCCTCCC 3' 

NOR 3275 5 1 AATGGTGAAAGTGACATCACTCCTGCCATCAGCGGCAAGGGC 3" 

35 NOR 3276 5' GCCCTTGCCGCTGATGGCAGGAGTGATGTCACTTTCACCATT 3' 

NOR 3271 5' AGCGCGTCCGCGGTAGCTATG 3' 
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The sequence of the Endo 1 core and the - 43 kD CBD and B 
region is shown in the appended Fig. 15A-D. 
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CLAIMS 



1. A cellulose- or hemicellulose-degrading enzyme which is 
derivable from a fungus other than Trichoderma or 
5 Phanerochaete . and which comprises a carbohydrate binding 
domain homologous to a terminal A region of Trichoderma reesei 
cellulases, which carbohydrate binding domain comprises the 
following amino acid sequence 

10 1 10 

Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 

20 30 
Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 



15 



Xaa 



or a subsequence thereof capable of effecting binding of the 
20 enzyme to an insoluble cellulosic or hemicellulosic substrate. 

2. An enzyme according to claim 1, which is derivable from a 
strain of Humicola. Fusarium or Mvceliopthora . 

25 3. An enzyme according to claim 1, wherein the variations in 
the amino acid sequence shown in claim 1 are selected as 
follows 





in 


position 


1, 




in 


position 


2, 


30 


in 


position 


1, 




in 


position 


8, 




in 


position 


9/ 




in 


position 


10, 




in 


position 


12, 


35 


in 


position 


13, 




in 


position 


14, 




in 


position 


18, 




in 


position 


19, 




Leu or Ala; 




40 


in position 


20, 
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in position 24, the amino acid is Gin or lie; 
in position 26, the amino acid is Gin, Asp or Ala; 
in position 27, the amino acid is Trp, Phe or Tyr; 
in position 29, the amino acid is Ser, His or Ala; and/or 
5 in position 32, the amino acid is Leu, He, Gin, Val or Thr. 

4. An enzyme according to claim 3, wherein the carbohydrate 
binding domain comprises the following amino acid sequence 

10 Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys Cys Glu 
Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly He Gly Trp Asn Gly Pro Thr Thr Cys Val 
15 Ser Gly Ala Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly He Gly Phe Asn Gly Pro Thr Cys Cys Gin 
Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin Cys 
20 Leu; 

Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr Cys Ala 
Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin Cys Thr 
Pro; 

25 

Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys Cys Ser 
Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin Cys Leu 
Asn; 

30 Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn Cys Glu 
Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin Cys 
lie; 

Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn Cys Glu 
Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin Cys 
35 Leu; 
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Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cys 
Leu; 

5 Trp Gly Gin Cys Gly Gly Gin Asn Tyr Ser Gly Pro Thr Thr Cys Lys 
Ser Pro Phe Thr Cys Lys Lys lie Asn Asp Phe Tyr Ser Gin Cys 
Gin; or 

Trp Gly Gin Cys Gly Gly Asn Gly Trp Thr Gly Ala Thr Thr Cys Ala 
10 Ser Gly Leu Lys Cys Glu Lys lie Asn Asp Trp Tyr Tyr Gin Cys Val 

5. An enzyme according to any of claims 1-4, which further 
comprises an amino acid sequence which defines a linking B 
region connecting the carbohydrate binding domain to the 

15 catalytically active domain of the enzyme. 

6. An enzyme according to claim 5, wherein the linking B region 
is one which is enriched in the amino acids glycine and/or 
asparagine and/or proline and/ or serine and/ or threonine and/or 

20 glutamine. 

7. An enzyme according to claim 6, wherein one or more of said 
amino acids appear in short, repetitive units. 

25 8. An enzyme according to any of claims 1-7, which comprises a 
carbohydrate binding domain derived from one naturally 
occurring cellulose- or hemicellulose-degrading enzyme, an 
amino acid sequence defining a linking B region, which amino 
acid sequence is derived from another naturally occurring 

30 cellulose- or hemicellulose-degrading enzyme, as well as a 
catalytically active domain derived from the enzyme supplying 
either the carbohydrate binding domain or B region or from a 
third enzyme. 

35 9. An enzyme according to claim 8, wherein the catalytically 
active domain is derived from an enzyme which does not, in 
nature, comprise a carbohydrate binding domain or B region. 
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10. An enzyme according to any of claims 1-9 which is a 
cellulase, e.g. an endoglucanase, cellobiohydrolase or 0- 
glucosidase. 

5 11. A DNA construct which comprises a DNA sequence encoding an 
enzyme according to any of claims 1-10. 

12. An expression vector which carries an inserted DNA 
construct according to claim 11. 

10 

13. A cell which is transformed with a DNA construct according 
to claim 11 or with an expression vector according to claim 12. 

14. A cell according to claim 13 which is a fungal cell, e.g. 
15 belonging to a strain of Aspergillus , e.g. Aspergillus fiiger or 

Aspergillus orvzae . or a yeast cell, e.g. belonging to a strain 
of Saccharomyces , such as Saccharomvces cerevisiae. 

15. A method of producing an enzyme according to any of claims 
20 1-10, wherein a cell according to claim 13 or 14 is cultured 

under conditions conducive to the production of the enzyme, and 
the enzyme is subsequently recovered from the culture. 

16. An agent for degrading cellulose or hemicellulose, the 
25 agent comprising an enzyme according to any of claims 1-10. 

17. An agent according to claim 16 comprising a combination of 
two or more enzymes according to any of claims 1-10, or a 
combination of one or more enzymes according to any of claims 

30 1-10 with one or more other enzymes with cellulose- or 
hemicellulose-degrading activity . 

18. A carbohydrate binding domain homologous to a terminal A 
region of Trichoderma reesei cellulases, which carbohydrate 

35 binding domain comprises the following amino acid sequence 
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1 10 

Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 

20 30 
5 Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 

Xaa 



10 or a subsequence thereof capable of effecting binding of a 
protein to an insoluble cellulosic or hemicellulosic substrate. 



19. A carbohydrate binding domain according to claim 18, 
wherein the variations in the amino acid sequence shown in 
15 claim 18 are selected as follows 



in position 1, the amino acid is Trp or Tyr; 

in position 2, the amino acid is Gly or Ala; 

in position 7, the amino acid is Gin, lie or Asn; 

20 in position 8, the amino acid is Gly or Asn; 

in position 9, the amino acid is Trp, Phe or Tyr; 



in position 


10, 


the 


amino 


acid 


is 


Ser, 


Asn, Thr or Gin; 


in position 


12, 


the 


amino 


acid 


is 


Pro, 


Ala or Cys; 


in position 


13, 


the 


amino 


acid 


is 


Thr, 


Arg or Lys; 


in position 


14, 


the 


amino 


acid 


is 


Thr, 


Cys or Asn; 


in position 


18, 


the 


amino 


acid 


is 


Gly 


or Pro; 


in position 


19, 


the 


amino 


acid 


(if present) is Ser, Thr, Phe, 


Leu or Ala; 
















in position 


20, 


the 


amino 


acid 


is 


Thr 


or Lys: 


in position 


24, 


the 


amino 


acid 


is 


Gin 


or lie; 


in position 


26, 


the 


amino 


acid 


is 


Gin, 


Asp or Ala; 


in position 


27, 


the 


amino 


acid 


is 


Trp, 


Phe or Tyr; 


in position 


29, 


the 


amino 


acid 


is 


Ser, 


His or Tyr; and/or 


in position 


32, 


the 


amino 


acid 


is 


Leu, 


He, Gin, Val or Thr. 



35 

20. A carbohydrate binding domain according to claim 19, 
wherein the carbohydrate binding domain comprises the following 
amino acid sequence 
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Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys Cys Glu 
Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin Cys 
Leu; 

5 Trp Gly Gin Cys Gly Gly lie Gly Trp Asn Gly Pro Thr Thr Cys Val 
Ser Gly Ala Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly He Gly Phe Asn Gly Pro Thr Cys Cys Gin 
10 Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr Cys Ala 
Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin Cys Thr 
15 Pro; 

Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys Cys Ser 
Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin Cys Leu 
Asn ; 

20 

Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn Cys Glu 
Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin Cys 
lie; 

Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn Cys Glu 
25 Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin Cys 
Leu; 

Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys 
Leu; 

30 

Trp Gly Gin Cys Gly Gly Gin Asn Tyr Ser Gly Pro Thr Thr Cys Lys 
Ser Pro Phe Thr Cys Lys Lys He Asn Asp Phe Tyr Ser Gin Cys 
Gin; or 

35 Trp Gly Gin Cys Gly Gly Asn Gly Trp Thr Gly Ala Thr Thr Cys Ala 
Ser Gly Leu Lys Cys Glu Lys He Asn Asp Trp Tyr Tyr Gin Cys Val 



AEPLACEMENTSHEET 



WO 91/17244 




CT/DK91/00124 



46 

21. A linking B region derived from a cellulose- or 
hemicellulose-degrading enzyme, said region comprising an amino 
acid sequence enriched in the amino acids glycine and/or 
asparagine and/or proline and/or serine and/or threonine and/or 

5 glutamine. 

22. A B region according to claim 21, wherein one or more of 
said amino acids appear in short, repetitive units. 

10 23. A B region according to claim 21 or 22, which comprises the 
following amino acid sequence 





Ala 


Arg 


Thr Asn Val Gly Gly Gly Ser Thr 


Gly 


Gly 


Gly 


Asn 


Asn 


Gly 




Gly 


Gly 


Asn Asn Gly Gly Asn Pro Gly Gly 


Asn 


Pro 


Gly 


Gly 


Asn 


Pro 


15 


Glv 


Glv 


Asn Pro Gly Gly Asn Pro Gly Gly 


Asn 


Pro 


Gly 


Glv 


Asn 


Cys 




Ser 


Pro 


Leu; 
















Pro 


Gly 


Gly Asn Asn Asn Asn Pro Pro Pro 


Ala 


Thr 


Thr 


Ser 


Gin 


Trp 




Thr 


Pro 


Pro Pro Ala Gin Thr Ser Ser Asn 


Pro 


Pro 


Pro 


Thr 


Gly 


Gly 


20 


Gly 


Gly 


Gly Asn Thr Leu His Glu Lys; 
















Gly 


Gly 


Ser Asn Asn Gly Gly Gly Asn Asn 


Asn 


Gly 


Gly 


Gly 


Asn 


Asn 




Asn 


Gly 


Gly Gly Gly Asn Asn Asn Gly Gly 


Gly 


Asn 


Asn 


Asn 


Gly 


Gly 


25 


Gly 


Asn 


Thr Gly Gly Gly Ser Ala Pro Leu; 














Val 


Phe 


Thr Cys Ser Gly Asn Ser Gly Gly Gly 


Ser 


Asn 


Pro 


Ser 


Asn 




Pro 


Asn 


Pro Pro Thr Pro Thr Thr Phe lie 


Thr 


Gin 


Val 


Pro 


Asn 


Pro 




Thr 


Pro 


Val Ser Pro Pro Thr Cys Thr Val Ala Lys; 








30 


Pro 


Ala 


Leu Trp Pro Asn Asn Asn Pro Gin 


Gin Gly Asn 


Pro 


Asn 


Gin 




Gly 


Gly 


Asn Asn Gly Gly Gly Asn Gin Gly 


Gly Gly Asn 


Gly 


Gly 


Cys 




Thr 


Val 


Pro Lys; 
















Pro 


Gly 


Ser Gin Val Thr Thr Ser Thr Thr 


Ser 


Ser 


Ser 


Ser 


Thr 


Thr 


35 


Ser 


Arg 


Ala Thr Ser Thr Thr Ser Ala Gly 


Gly Val Thr 


Ser 


He 


Thr 




Thr 


Ser 


Pro Thr Arg Thr Val Thr lie Pro 


Gly Gly Ala 


Ser 


Thr 


Thr 




Ala 


Ser 


Tyr Asn; 
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Glu Ser Gly Gly Gly Asn Thr Asn Pro Thr Asn Pro Thr Asn Pro Thr 
Asn Pro Thr Asn Pro Thr Asn Pro Trp Asn Pro Gly Asn Pro Thr Asn 
Pro Gly Asn Pro Gly Gly Gly Asn Gly Gly Asn Gly Gly Asn Cys Ser 
Pro Leu; or 

Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser Ser Pro Val Asn Gin 
Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser Ser Pro Pro 
Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 



WO 91/17244 1 1 |PCT/DK91/00124 

1/22 



BamHI 




Hind HI 



BamHI 



STOP 



Avail 

i 

t 

Avail— Klenow+4dNTP'S 

I 

t 

Bst BI/BamHI 

i 

Bst BI- Avail bl.e.860bp >, 

+ > 
BamHI -Bst BI 700bp ■> 



Sal I 



BamHI/NruI 
4.5kb 




Ava %rul 



Sal I 



Bst BI 



Bal I 



EcoRI 



. AvaII /NruI 




Sal I 



BamHI 
Bst Bl/BamH i/Bal I 



' Bst BI- BamHI 6.0kb 
+ 

Bst BI - Bal I 620bp 
+ 

Bal I - BamH I linker: 

5' GATCCACCATGG 3' NOR-969 
3' GTGGTACC 5' NOR-970 



BamHI Ball 



RPPI Ar.PMPWT.QWPPT 




BamHI/ 



F ''g. 2 Ncol 
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Narl 



EcoRI 



Sail 




BamHI 



Ncol 



+ BstEH-BamH I 
linker 2492/2493 



BstEH-Bglll 
5.7kb 



+ BstEII-BamHI 
linker 2645/2646 



Narl 



Narl 



EcoR I 




DHW704 | Bam ^ 
- |( Bgll 



BamHI, 

Ncol 



Nco 



EG r, 

truncated 



Fig. 3 



EG l, 
complete 
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Aot n 5455 
XmnI5132 
Sea I 5015 



183 Ndel 
235 Bbel 
235 Narl 

391 BstXI 
548 Smal 
548 Xmal 
638 PpuMI 




Afl H 3644 



HinDffl 3285 
Sph I 3252 
Sal I 3240 
StuI3220 



BssH E 2810 



963 BspMII 

1148 Bgl n 
1196 Asp718 
1196 Kpnl 



1800 Nael 



2103 BamHI 



Fig. 5 
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+ 040/041 M linker 
pvuI-Hinc E 



Pvul 

pvul 
Hincfl 

pHW775 tEcoRi 



HincII-EcoRI 
2.6kb 




pvul 
Hind 

Bgin 



EcoRI 



EcoRI 



BamHr 



BamHI 
Bgin 
pvul 

BstEII „ 
Fig. 6 



BglE 
pvul 

BstEII 



Nco 
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Xhol 
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xhon 

Xho 




Ap 



R 



BamH I 
Sell 



43k 



Xhol 



pUC19 



xho1 x^oir SdI 

Sall-XhoH 



1. Digestion with 
BamH I.EcoRI 

2. Insertion of linker 
NOR3045-NOR3046 




OR I 



BamHI-Sall 

EcoRI 
BamHI 



43k 



Sail 



Hind 



r EcoRI 
, SacII 
EcoRY 



Sac! 



HincH 
HincH 
HincH 
Hind 




Endol 




Hincn 



EcoRy/HincH 
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NOR3276 



NOR3270 



Bcil/BamHI 




43 kdal 



NOR3271 t W^ 0 S n T 
Endol 



Sal/Xho 




Bell 



SacII 




5'. 
3'. 



MIX, MELT, ANNEAL 



3' 



3' 



+ 



5' 



5' 



■5* 



CH1091 



Bell 



72*C, NUCLEOTIDES, 
TAQ POLYMERASE 

SacII 



■ i . s s '. s : 



Bell 



SacII 



CH1094 



43 kdal signal Fig. 9 Endol N terminal 
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i 

Bell SacII 
43 kdal signal Endol N terminal 



Sal I, Sac II 




Sacfl EcoRY/HincH 
Fig. 10 
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agaccggaattcgcggccgccatctatccaacggtctagcttcacttcacaatgtatcgc 

m y r 

atcgtcgcaaccgcctcggctcttattgccgctgctcgggctcaacaggtctgctctttg 

IVATASALIAAARAQQVCSL 

aacaccgagaccaagcctgccttgacctggtccaagtgtacatccagcggctgcagcgat 

NT ETKPALTWSKCTSSGCSD 

gtcaagggctccgttgttattgatgccaactggcgatggactcaccagacttctgggtct 

VKGSVVIDANWRWTHQTSGS 

accaactgttacaccggaaacaagtgggacacctccatctgcactgatggcaagacctgc 

TNCYTGNKWDTSICTDGKTC 

GCCGAAAAGTGCTGTCTTGATGGCGCCGACTATTCTGGTACCTACGGAATCACCTCCAGC 

AEKCCLDGADYSGTYGITSS 

qqcaaccagctcagtcttggattcgtcaccaacggtccctacagcaagaacatcggcagc 

G NQLSLGFVTNGPYSKNIGS 

cgaacctacctcatggagaacgagaacaccatccagatgttccagcttctgggcaacgag 

R TYLMENENTYQHFQLLGNE 

ttcacctttgatgtcgatgtctctggtatcggctgcggtctgaacggtgcccctcacttc 

FTFDVDVSGIGCGLNGAPHF 

otcagcatggacgaggatggtggcaaggccaagtactccggaaacaaggccggagccaag 

VS M DE DGGKAKYSGNKAGAK 

tacggaactggcTACtGTGATGCCCAgTGCCCTCGTGATGTCAAGTTCATCAACGGAGTT 

YGTGYCDAQCPRDVKFINGV 

GCCAACTCTGAGGGCTGGAAGCCCTCTGACAGTGATGTCAACGCtggtgttggtaatCtg 

AN SEGWKPSDSDVNAGVGNL 

ggcacctgctgccccgagatggatatctgggaggccaactccatctccaccgccttcact 

G TCCPEMDIWEANSISTAFT 

cctcatccttgcaccaagctcacacagcactcttgcactggcgactcttgtggtggaacc 

PHPCTKLTQHSCTGDSCGGT 

tactctagtgaccgatatggcggtacttgcgatgccgacggttgtgatttcaatgcctac 

YS SDRYGGTCDADGCDFNAY 

cgtcagggcaacaagaccttctacggtcctggatccaacttcaacatcgacaccaccaag 

R Q GNKTFYGPGSNFNIDTTK 

aagatgactgttgtcactcagttccacaagggcAGCAAcGGACGTCTTTCTGAGATCACC 

K MTVVTQFHKGSNGRLSEIT 

CGTCTGTACGTCCAGAACGGCAAGGTCATTGCCAACTCAGAGTCCAAGATTGCAGGCAAC 

RLYVQNGKVIANSESKIAGN 

CCCGGTAGCTCTCTCACCTCTGACTTCTGCTCCAAGCAGAAGAGCGTCTTTGGCGATATC 

PGSSLTSDFCSKQKSVFGDI 

GATGACTTCTCTAAGAAGGGTGGCTGGAACGgCATGAGCGATGCTCTCTCTGCCCCTATG 

DD FSKKGGWNGMSDALSAPM 

GTTCTTGTTATGTCTCTCTGGCACGACCACCACTCCAAcATGCtcTGGCTgGACtCtacc 

VLVMSLWHDHHS NKLWLDST 

tacccaaccgactctaccaaggttggatctcaacgaggttcttgcgctaccacctctggc 

YPTDSTKVGSQRGSCATTSG 

aaqccctccgaccttgagcgagatgttcccaactccaaggtttccttctccaacatcaAG 

K PSDLERDVPNSKVSFSNIK 

TTCGGTCCCATCGGAAGCACCTACAAGAGCGACGGCACCACCCCCAACCCCCCTgCCAGC 

FGPIGSTYKSDGTTPNPPAS 

AGCAGCACCACTGGTTCTTCCACTCCCACCAACCCCCCTGCCGGTAGCGTCGACCAATGG 

SSTTGSSTPTNPPAGSVDQW 

GGACAgTGcGGTGGCCAgaactacagcggccccacgacctgcaagtctcctttcacctgc 

GQCGGQNYSGPTTCKSPFTC 

aagaagatcaacgacttctactcccagtgtcagtaaaggggctgccgagctatctagcat 

KK INDFYSQCQ 

gagattgagaaacgatgtgatgagtggacgatcaaggagaagtgtgtggatgatatgaac 
ttgatgtgggaggac pjg. -j-j 
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gaattcgcggccgcctgcttcgaagcatcagctcattgagatcagtcaaaatgcatacc 

M H T 

ctttcggttctcctcgctctcgctcccgtgtccgcccttgctcaggctcccatctgggga 

LSVLLALAPVSALAQAPIWG 

cagtgcggtggcaatggttggaccggtgctacaacctgcgctagtggtctgaagtgtgag 

QCGGNGWTGATTCASGLK CE 

aagatcaacgactggtactatcagtgtgttcctggatctggaggatctgaaccccagcct 

KINDWYYQCVPGSGGSEPQP 

tcgtcaactcagggtggtggcactcctcagcctactggcggtaacagcggcggcactggt 

SSTQGGGTPQPTGGNSGGTG 

ctcgacgccaaattcaaggccaagggcaagcagtactttggtaccgagattgaccactac 

LDAKFKAKGKQYFGTEIDHY 

caccttaacaacaatcctctgatcaacattgtcaaggcccagtttggccaagtgacatgc 

HLNNNPLINIVKAQFGQVTC 

gagaacagcatgaagtgggatgccattgagccttcacgcaactccttcaccttcagtaac 

ENSMKWDAIEPSRNSFTFSN 

gctgacaaggtcgtcgacttcgccactcagaacggcaagctcatccgtgGCCACACTCTT 

ADKVVDFATQNGKLIRGHTL 

CTCTGGCACTCTCAGCTGCCTCAGTGGGTTCAGAACATCAACGATCGCTCTACCCTCACC 

LWHSQLPQWVQNINDRSTLT 

GCGGTCATCGAGAACCACGTCAAGACCATGGTCACCCGCTACAAGGGCAAGATCCTCCAG 

AVIENHVKTMVTRYKGKILQ 

TGGGACGTTGTCAACAACGAGATCTTCGCTGAGGACGGTAACCTCCGCGACAGTGTCTTC 

WDVVNNEIFAEDGNLRDSVF 

AGCCGAGTTCTCGGTGAGGACTTTGTCGGTATTGCTTTCCGCGCTGCCCGCGCCGCTGAT 

SRVLGEDFVGIAFRAARAAD 

CCCGCTGCCAAGCTCTACATCAACGATTATAACCTCGACAAGTCCGACTATGCTAAGGTC 

PAAKLYINDYNLDKSDYAKV 

ACCCGCGGAATGGTCGCTCACGTTAATAAGTGGATTGCTGCTGGTATTCCCATCGACGGT 

TRGMVAHVNKWIAAGIPIDG 

ATTGGATCTCAGGGCCATCTTGCTGCTCCTAGTGGCTGGAACCCTGCCTCTGGTGTTCCT 

IGSQGHLAAPSGWNPASGVP 

GCTGCTCTCCGAGCTCTTGCCGCCTCGGACGCCAAGGAGATTGCTATcactgagcttgat 

AALRALAASDAKEIAITELD 

attgccggtgccagtgctaacgattaccttactgtcatgaacgcttgccttgccgttccc 

IAGASANDYLTVMNACLAVP 

aagtgtgtcggcatcactgtctggggtgtctctgacaaggactcgtggcgacctggtgac 

KCVGITVWGVSDKDSWRPGD 

aaccccctcctctacgacagcaactaccagcccaaggctgctttcaatgccttggctaac 

NPLLYDSNYQPKAAFNALAN 

gctctgtgagctgttgttgatgtatgtcgctggatcatacaacgaaacgtcctagttgga 

A L 

taaagcgttgatggtagaatgat 



Fig. 12 
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gaattcgcggccgcctagataagtcactacctgatctctgaataatctttcatcatgaag 

M K 

tctctctcactcatcctctcagccctggctgtccaggtcgctgttgctcaaacccccgac 

SLSLILSALAVQVAVAQTPD 

aaggccaaggagcagcaccccaagctcgagacctaccgctgcaccaaggcctctggctgc 

K AK E Q HPKLETYRCTKASGC 

aagaagcaaaccaactacatcgtcgccgaCgcaggtattcacggcattCgcagaagcgcc 

KKQTNYIVADAGIHGIRRSA 

GGCTGCGGTGACTGGGGTCAAAAGCCCAACGCCACAGCCTGCCCCGATGAGGCATCCTGC 

GCGDWGQKPNATACPDEASC 

GCTAAGAACTGTATCCTCAGTGGTATGGACTCAAACGCTTACAAGAACGCTGGTATCACT 

AKNCILSGMDSNAYKNAGIT 

ACTTCTGGCAACAAGCTTCGTCTTCAGCAGCTTATCAACAACCAGCTTGTTTCTCCTCGG 

TSGMKLRLQQLI NNQLVS PR 
GTTTATCTGCTTGAGGAGAACAAGAAGAAGTATGAGATGCTTCAGCTCACTGGTACTGAA 

VYLLEENKKKYEMLHLTGTE 
TTCTCTTTCGACGTTGAGATGGAGAAGCTTCCTTGTGGTATGAATGGTGCTTTGTACCTT 
FSFDVEMEKLPCGMNGALYL 
TCCGAGATGCCACAGGATGGTGGTAAGAGCACGAGCCGAAACAGCAAGGCTGGTGCCTAC 

S EMPQDGGKSTS R NSKAGAY 

TATGGTGCTGGATACTGTGATGCTCAGTGCTACGTCactcctttcATCAACGGAGTTGGC 

YGAGYCDAQCYVTPFINGVG 

AACATCAAGGGACAGGGTGTCTGCTGTAACGAGCTCGACATCTGGGAGGCCAACTCCCGC 

NIKGQGVCCNELD IWEANSR 

GCAACTCACATTGCTCCTCACCCTTGCAGCAAGCCCGGCCTCTACGGCTGCACAGGCGAT 

ATHIAPHPCSKPGLYGCTGD 

GAGTGCGGCAGCTCCGGTTTCTGCGACAAGGCCGGCTGCGGCTGGAACCACAACCGCATC 

ECGSSGICDKAGCGWNHNRI 

AACGTGACCGACTTCTACGGccgcggCAAGCAGTACAAGGTCGACAGCACCCGCAAGTTC 

NVTDFYGRGKQYKVDSTRKF 

ACCGTGACATCTCAGTTCGTCGCCAACAAGCAGGGTGATCTCATCGAGCTGCACCGCCAC 

TVTSQFVANKQGDLIELHRH 

TACATCCAGGACAACAAGGTCAtcgagtctgctgtcgtcaacatctccggccctcccaag 

YIQDNKVIESAVVNISGPPK 

atcaacttcatcaatgacaagtactgcgctgccaccggcgccaacgagtacatgcgcctc 

INFINDKYCAATGANEYMRL 

ggcqgtactaagcaaatgggcgatgccatgtcccgcggaatggttctcgccatgagcgtc 

G G TK QMGDAMSRGMVLAMSV 

tqgtggagcgagggtgatttcatggcctggttggatcagggtgttgctggaccctgtgac 

WHS E G DFMAWLDQGVAGPCD 

qccaccgagggcgatcccaagaacatcgtcaaggtgcagcccaaccctgaagtgacattt 

ATE G DPKNIVKVQPNPEVTF 

agcaacatcagaattggagagattggatctacttcatcggtcaaggctcctgcgtatcct 

S NIR IGEIGS TSSVKAPAYP 

ggtcctcaccgcttgtaaaaacatcaaacaacaccgtgtccaatatggATCTTAGTGTCC 

G P H R L • 

ACTTGCTGGGAAGCTATTGGAGCACATATGCAAAACAGATGTCCACTAGCTTGACACGTA 
TGTCGGGGCAAAAAAATCTCTITCTAGGATAGGAGAACATATTGGGTGTTTGGACTTGTA 
TATAAATGATACATTTTTCATATTATATTATTTTCAACATATTTTATTTCACGAAAAAAA 

AAAAAAAAAAAAAAAAAAAAAAAA 

Fig. 13 
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I ! ! j 1 1 

TTTCTTCGTCGAGCTCGAGTCGTCCGCCGTCTCCTCCTCCTCCTCCTTCCAGTCTTTGAG 
70 80 90 100 110 120 

ill!!! 

TTCCTTCGACCTGCAGCGTCCTGAACAACTCGCTCTAGCTCAACAACCATGGCTCGCGGT 

MetAlaArgGly 

130 140 150 160 170 180 

! ! ! 1 i i 

ACCGCTCTCCTCGGCCTGACCGCGCTCCTCCTGGGGCTGGTCAACGGCCAGAAGCCTGGT 
ThrAlaLeuLeuGlyLeuThrAlaLeuLeuLeuGlyLeuValAsnGlyGlnLysProGly 

190 200 210 220 230 240 

!!!!!! 

GAGACCAAGGAGGTTCACCCCCAGCTCACGACCTTCCGCTGCACGAAGAGGGGTGGTTGC 
GluThrLysGluValHisProGlnLeuThrThrPheArgCysThrLysArgGlyGlyCys 

250 260 270 280 290 300 

! ! . ! ! ! 

AAGCCGGCGACCAACTTCATCGTGCTTGACTCGCTGTCGCACCCCATCCACCGCGCTGAG 
LysProAlaThrAsnPhelleValLeuAspSerLeuSerHisProIleHisArgAlaGlu 

Fig. 14A 
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310 
I 

GGCCTGGGCCCTGGCGGCTGCGGCGACTGGGGCAACCCGCCGCCCAAGGACGTCTGCCCG 
GlyLeuGlyProGlyGlyCysGlyAspTrpGlyAsnProProProLysAspValCysPro 

370 380 390 400 410 420 

iii ii J 

i i i i i i 

GACGTCGAGTCGTGCGCCAAGAACTGCATCATGGAGGGCATCCCCGACTACAGCCAGTAC 
AspValGluSerCysAlaLysAsnCysIleMetGluGlylleProAspTyrSerGlnTyr 

430 440 450 460 470 480 

1 1 ! ! ! ! 

i i i i i i 

GGCGTCACCACCAACGGCACCAGCCTCCGCCTGCAGCACATCCTCCCCGACGGCCGCGTC 
GlyValThrThrAsnGlyThrSerLeuArgLeuGlnHisIleLeuProAspGlyArgVal 

490 500 510 520 530 540 

i I i i i I 

l I I I « i 

CCGTCGCCGCGTGTCTACCTGCTCGACAAGACGAAGCGCCGCTATGAGATGCTCCACCTG 
ProSerProArgValTyrLeuLeuAspLysThrLysArgArgTyrGluMetLcuHisLeu 

550 560 570 580 590 600 

I i i i l i 

I l I I l I 

ACCGGCTTCGAGTTCACCTTCGACGTCGACGCCACCAAGCTGCCCTGCGGCATGAACAGC 
ThrGlyPheGluPheThrPheAspValAspAlaThrLysLeuProCysGlyMetAsnSer 

610 620 630 640 650 660 

i i I i I i 

i ! i i i i 

GCTCTGTACCTGTCCGAGATGCACCCGACCGGTGCCAAGAGCAAGTACAACTCCGGCGGT 
AlaLeuTyrLeuSerGluMetHisProThrGlyAlaLysSerLysTyrAsnSerGlyGly 

Fig. 14B 
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GCCTACTACGGTACTGGCTACTGCGATGCTCAGTGCTTCGTGACGCCCTTCATCAACGGC 
AlaTyrTyrGlyThrGlyTyrCysAspAlaGlnCysPheValThrProPhelleAsnGly 

730 740 750 760 770 780 

i i i i i i 

i i i i i i 

TTGGGCAACATCGAGGGCAAGGGCTCGTGCTGCAACGAGATGGATATCTGGGAGGTCAAC 
LeuGlyAsnlleGluGlyLysGlySerCysCysAsnGluMetAspIleTrpGluValAsn 

790 800 810 820 830 840 

i i I i I i 

I I I » I i 

TCGCGCGCCTCGCACGTGGTTCCCCACACCTGCAACAAGAAGGGCCTGTACCTTTGCGAG 
SerArgAlaSerHisValValProHisThrCysAsnLysLysGlyLeuTyrLeuCysGlu 

850 860 870 880 890 900 

i i i i i i 

i i i > i i 

GGTGAGGAGTGCGCCTTCGAGGGTGTTTGCGACAAGAACGGCTGCGGCTGGAACAACTAC 
GlyGluGluCysAlaPheGluGlyValCysAspLysAsnGlyCysGlyTrpAsnAsnTyr 

910 920 930 940 950 960 

I i i i I I 

I i i i I i 

CGCGTCAACGTGACTGACTACTACGGCCGGGGCGAGGAGTTCAAGGTCAACACCCTCAAG 
ArgValAsnValThrAspTyrTyrGlyArgGlyGluGluPheLysValAsnThrLeuLys 

970 980 990 1000 1010 1020 

i i i i i i 

i i i i i t 

CCCTTCACCGTCGTCACTCAGTTCTTGGCCAACCGCAGGGGCAAGCTCGAGAAGATCCAC 
ProPheThrValValThrGlnPheLeuAlaAsnArgArgGlyLysLeuGluLysIleHis 

Fig. 14C 
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1030 1040 1050 1060 1070 1080 

lit ill 
ill ill 

CGCTTCTACGTGCAGGACGGCAAGGTCATCGAGTCCTTCTACACCAACAAGGAGGGAGTC 
ArgPheTyrValGlnAspGlyLysVallleGluSerPheTyrThrAsnLysGluGlyVal 

1090 1100 1110 1120 1130 1140 

iii iii 
lit iii 

CCTTACAGCAACATGATCGATGACGAGTTCTGCGAGGCCACCGGCTCCCGCAAGTACATG 
ProTyrThrAsnMetlleAspAspGluPheCysGluAlaThrGlySerArgLysTyrMet 

1150 1160 1170 1180 1190 1200 

ill iii 
ill ill 

GAGCTCGGCGCCACCCAGGGCATGGGCGAGGCCCTCACCCGCGGCATGGTCCTGGCCATG 
GluLeuGlyAlaThrGlnGlyMetGlyGluAlaLeuThrArgGlyMetValLeuAlaMet 

1210 1220 1230 1240 1250 1260 

i i i i i I 

iii iii 

AGCATCTGGTGGGACCAGGGCGGCAACATGGAGTGGCTCGACCACGGCGAGGCCGGCCCC 
SerlleTrpTrpAspGlnGlyGlyAsnMetGluTrpLeuAspHisGlyGluAlaGlyPro 

1270 1280 1290 1300 1310 1320 

iii iii 
iii iii 

TGCGCCAAGGGCGAGGGCGCCCCGTCCAACATTGTCCAGGTTGAGCCCTTCCCCGAGGTC 
CysAlaLysGlyGluGlyAlaProSerAsnlleValGlnValGluProPheProGluVal 

1330 1340 1350 1360 1370 1380 

i i i iii 

ill iii 

ACCTACACCAACCTCCGCTGGGGCGAGATCGGCTCGACCTACCAGGAGGTTCAGAAGCCT 
ThrTyrThrAsnLeuArgTrpGlyGluIleGlySerThrTyrGlnGluValGlnLysPro 



Fig, 14D 
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1390 1400 1410 1420 1430 1440 

!!!!!! 

AAGCCCAAGCCCGGCCACGGCCCCCGGAGCGACTAAGTGGTGATGGGATAGAGGGATAGA 
LysProLysProGlyHisGlyProArgSerAspEND 

1450 1460 1470 1480 1490 1500 

i ! ! ! i ! 

ATAGTGGATAGCACATAGATCGGCGGTTTTGGATAGTTTAATACATTCCGTTGCCGTTGT 
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l 

GAAAAAAAAA - poly-A 



Fig. 14E 
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ATGCGTTCCrCCCCCCTCCTCCCGTCCGCCGTTGTGGCCGCCCTGCCGGTGTTGGCCCTT 

METArgSerSerProLeuLeuProSerAlaValValAlaAlaLeuProValLeuAlaLeu 
43 Xdal signalpeptide and N terminal 

70 80 90 100 110 120 

i i i i i j 

i i i i i i 

GCCGCTGATGGCAGGAGTGATGTCACTTTCACGATTAATACGCAGTCGGAACGTGCAGCG 

AlaAlaAspGlyArgSerAspValThrPheThrlleAsnThrGlnSerGluArgAlaAla 
N terminal 

130 140 150 160 170 180 

i i i i I i 

j i i i i i 

ATCAGCCCCAATATTTACGGAACCAATCAGGATCTGAGCGGGACGGAGAACTGGTCATCC 

IleSerProAsnlleTyrGlyThrAsnGlnAspLeuSerGlyThrGluAsnTrpSerSer 

190 200 210 220 230 240 

I i I i i i 

j I i i i i 

CGCAGGCTCGGAGGCAACCGGCTGACGGGTTACAACTGGGAGAACAACGCATCCAGCGCC 

ArgArgLeuGlyGlyAsnArgLeuThrGlyTyrAsnTrpGluAsnAsnAlaSerSerAla 

250 260 270 280 290 300 

i i i i i i 

i i iiit 

GGAAGGGACTGGCTTCATTACAGCGATGATTTTCTCTGCGGCAACGGTGGTGTTCCAGAC 

GlyArgAspTrpLeuHisTyrSerAspAspPheLeuCysGlyAsnGlyGlyValProAsp 

Endo 1 core 

310 320 330 340 350 360 

I l i i i I 

i i i i i I 

ACCGACTGCGACAAGCCGGGGGCGGTTGTTACCGCTTTTCACGATAAATCTTTGGAGAAT 

ThrAspCysAspLysProGlyAlaValValThrAlaPheHisAspLysSerLeuGluAsn 

370 380 390 400 410 420 

i i I i I I 

l i i > I l 

GGAGCTTACTCCATTGTAACGCTGCAAATGGCGGGTTATGTGTCCCGGGATAAGAACGGT 

GlyAlaTyrSerlleValThrLeuGlnMETAlaGlyTyrValSerArgAspLysAsnGly 

430 440 450 460 470 480 

I i i i i I 

CCAGTTGACGAGAGTGAGACGGCTCCGTCACCGCGTTGGGATAAGGTCGAGTTTGCCAAA 
ProValAspGluSerGluThrAlaProSerProArgTrpAspLysValGluPheAlaLys 

Fig. 15A 
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AATGCGCCGTTCTCCCTTCAGCCTGATCTGAACGACGGACAAGTGTATATGGATGAAGAA 
AsnAlaProPheSerLeuGlnProAspLeuAsnAspGlyGlnValTyrMETAspGluGlu 

550 560 570 580 590 600 

i i : ; : : 

GTTAACTTCCTGGTCAACCGGTATGGAAACGCTTCAACGTCAACGGGCATCAAAGCGTAT 
ValAsnPheLeuValAsnArgTyrGlyAsnAlaSerThrSerThrGlyllcLysAlaTyr 

610 620 630 640 650 660 

i i i : i i 

TCGCTGGATAACGAGCCGGCGCTGTGGTCTGAGACGCATCCAAGGATTCATCCGGAGCAG 
SerLeuAspAsnGluProAlaLeuTrpSerGluThrHisProArglleHisProGluGln 

670 680 690 700 710 720 

i i l j j j 

TTACAAGCGGCAGAACTCGTCGCTAAGAGCATCGACT 

LeuGlnAlaAlaGluLeuValAlaLysSerlleAspLeuSerLysAlaValLysAsnVal 

730 740 750 760 770 780 

ill ill 
I I i i I I 

GATCCGCATGCCGAAATATTCGGTCCTGCCCTTTACGGTTTCGGCGCATATTTGTCTCTG 

AspProHisAlaGluIlePheGlyProAlaLeuTyrGlyPheGlyAlaTyrLeuSerLeu 

790 800 810 820 830 840 

i i i i i i 

S i i i i I 

CAGGACGCACCGGATTGGCCGAGTTTGCAAGGCAACTACAGCTGGTTTATCGATTACTAT 

GlnAspAlaProAspTrpProSerLeuGlnGlyAsnTyrSerTrpPhelleAspTyrTyr 

850 860 870 880 890 900 

i t l i I i 

i i i i I I 

CTGGATCAGATGAAGAATGCTCATACGCAGAACGGCAAAAGATTGCTCGATGTGCTGGAC 

LeuAspGlnMETLysAsnAlaHisThrGlnAsnGlyLysArgLeuLeuAspValLeuAsp 

910 920 930 940 950 960 

i i | j j i 

GTCCACTGGTATCCGGAAGCACAGGGCGGAGGCCAGCGAATCGTCTTTGGCGGGGCGGGC 
ValHisTrpTyrProGluAlaGlnGlyGlyGlyGlnArglleValPheGlyGlyAlaGly 

Fig. 15B 
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970 980 990 1000 1010 1020 

| | J ; i i 

AATATCGATACGCAGAAGGCTCGCGTACAAGCGCCAAGATCGCTATGGGATCCGGCTTAC 
AsnlleAspThrGlnLysAlaArgValGlnAlaProArgSerLeuTrpAspProAlaTyr 

1030 1040 1050 1060 1070 1080 

i { ! 1 i | 

CAGGAAGACAGCTGGATCGGCACATGGTTTTCAAGCTACTTGCCCTTAATTCCGAAGCTG 

GlnGltLAspSerTrpIleGlyThrTrpPheSerSerTyrLeuProLeuIleProLysLeu 

1090 1100 1110 1120 1130 1140 

| I i I I i 

CAATCTTCGATTCAGACGTATTATCCGGGTACGAAGCTGGCGATCACAGAGTTCAGCTAC 
GlnSerSerlleGlnThrTyrTyrProGlyThrLysLeuAlalleThrGluPheSerTyr 

1150 1160 1170 1180 1190 1200 

I i 1 1 11 

GGCGGAGACAATCACATTTCGGGAGGCATAGCTACCGCGGACGCGCTCGGCATTTTTGGA 

GlyGlyAspAsnHisIleSerGlyGlylleAlaThrAlaAspAlaLeuGlyllePheGly 
1210 1220 1230 1240 1250 1260 

I j i i i i 

AAATATGGCGTTTATGCCGCGAATTACTGGCAGACGGAGGACAATACCGATTATACCAGC 
LysTyrGlyValTyrAlaAlaAsnTyrTrpGlnThrGluAspAsnThrAspTyrThrSer 

1270 1280 1290 1300 1310 1320 

!!!{'• 
GCTGCTTACAAGCTGTATCGCAACTACGACGGCAATAAATCGGGGTTCGGCTCGATCAAA 

AlaAlaTyrLysLeuTyrArgAsnTyrAspGlyAsnLysSerGlyPheGlySerlleLys 

1330 1340 1350 1360 1370 1380 

'II!!' 
GTGGACGCCGCTACGTCCGATACGGAGAACAGCTCGGTATACGCTTCGGTAACTGACGAG 

ValAspAlaAlaThrSerAspThrGluAsnSerSerValTyrAlaSerValThrAspGlu 

1390 1400 1410 1420 1430 1440 

{ { 11 i | 

GAGAATTCCGAACTCCACCTGATCGTGCTGAATAAAAATTTCGACGATCCGATCAACGCT 
GluAsnSerGluLeuHisLeuIleValLeuAsnLysAsnPheAspAspProlleAsnAla 

Fig. 15C 
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i i i i j j 
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ACTTTCCAGCTGTCTGGTGATAAAACCTACACATCCGGGAGAGTATGGGGCTTCGACCAA 

ThrPheGlnLeuSerGlyAspLysThrTyrThrSerGlyArgValTrpGlyPheAspGln 
1510 1520 1530 1540 1550 1560 

I I I I I I 

ACCGGATCCGACATTACGGAACAAGCAGCTATAACGAATATTAACAACAATCAATTCACG 

ThrGlySerAspIleThrGluGlnAlaAlalleThrAsnlleAsnAsnAsnGlnPheThr 
1570 1580 1590 1600 1610 1620 

i i : : : i 

TATACGCTTCCTCCATTGTCGGCTTACCACATTGTTCTGAAAGCGGATAGCACCGAACCG 
TyrThrLeuProProLeuSerAlaTyrHisIleVall^euLysAlaAspSerThrGluPro 

1630 1640 1650 1660 1670 1680 

! i i ! I i 

GTCATCTCCGAGATCCCCTCCAGCAGCACCAGCTCTCCGGTCAACCAGCCTACCAGCACC 

VallleSerGluIleProSerSerSerThrSerSerProValAsnGlnProThrSerThr 
Linker 43 kdal B region 

1690 1700 1710 1720 1730 1740 

i i i i I i 

l l I i l 

AGCACCACGTCCACCTCCACCACCTCGAGCCCGCCAGTCCAGCCTACGACTCCCAGCGGC 

SerThrThrSerThrSerThrThrSerSerProProValGlnProThrThrProSerGly 

1750 1760 1770 1780 1790 1800 

i i i i i i 

TGCACTGCTGAGAGGTGGGCTCAGTGCGGCGGCAATGGCTGGAGCGGCTGCACCACCTGC 

CysThrAlaGluArgTrpAlaGlnCysGlyGlyAsnGlyTrpSerGlyCysThrThrCys 

4 3 kdal A region 

1810 1820 1830 1840 1850 

i i i i i 

j i i i I 

GTCGCTGGCAGCACTTGCACGAAGATTAATGACTGGTACCATCAGTGCCTGTAG 

ValAlaGlySerThrCysThrLysIleAsnAspTrpTyrHisGlnCysLeu— 

Fig. 15D 
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